View CW4011_1087329.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

minirisc ? CW4011 superscalar microprocessor core technical manual order number c14040.a a coreware ? product
ii document db14-000064-01, second edition (may 1999) this document describes revision a of lsi logic corporations minirisc ? CW4011 superscalar microprocessor core and will remain the of?cial reference source for all revisions/releases of this product until rescinded by an update. to receive product literature, call us at 1.800.574.4286 (u.s. and canada); +32.11.300.531 (europe); 408.433.7700 (outside u.s., canada, and europe) and ask for department jds; or visit us at http://www.lsilogic.com. lsi logic corporation reserves the right to make changes to any products herein at any time without notice. lsi logic does not assume any responsibility or lia- bility arising out of the application or use of any product described herein, except as expressly agreed to in writing by lsi logic; nor does the purchase or use of a product from lsi logic convey a license under any patent rights, copyrights, trademark rights, or any other of the intellectual property rights of lsi logic or third parties. copyright ? 1996-1999 by lsi logic corporation. all rights reserved. trademark acknowledgment lsi logic logo design, minirisc, minisim, atmizer, and coreware are regis- tered trademarks and gigablaze, g10, right-first-time, and serialice are trade- marks of lsi logic corporation. sun and sparcstation are trademarks of sun microsystems, inc. sparc is a registered trademark of sparc international, inc. products bearing the sparc trademarks are based on an architecture developed by sun microsystems, inc. mips is a trademark of mips technologies, inc. ver- ilog is a registered trademark of cadence design systems, inc. all other brand and product names may be trademarks of their respective companies.
contents iii contents preface chapter 1 introduction 1.1 CW4011 overview 1-1 1.2 CW4011 core and building blocks 1-2 1.2.1 CW4011 core 1-3 1.2.2 CW4011 shell 1-3 1.2.3 other CW4011 components 1-3 1.2.4 interfaces 1-4 1.2.5 related modules 1-4 1.3 features 1-5 1.4 coreware program 1-6 chapter 2 architectural overview 2.1 architectural overview 2-1 2.2 cache and external interface 2-4 2.3 clocking and power management 2-5 2.4 pipeline architecture 2-5 2.4.1 instruction fetch and scheduling 2-7 2.4.2 instruction execution 2-8 2.5 instruction set summary 2-9 2.6 con?gurability and options 2-16 2.6.1 cache sizes 2-16 2.6.2 high-performance multiply accumulate unit 2-16 2.6.3 64-bit vs. 32-bit memory interface 2-16 2.6.4 memory management unit 2-17
iv contents chapter 3 instruction set 3.1 instruction set formats 3-1 3.2 load and store instructions 3-2 3.3 computational instructions 3-5 3.4 jump and branch instructions 3-11 3.5 trap instructions 3-15 3.6 special instructions 3-16 3.7 coprocessor instructions 3-16 3.8 system control coprocessor (cp0) instructions 3-18 3.9 cache maintenance instructions 3-20 3.10 CW4011 instruction set extensions 3-21 3.11 cpu instruction opcode bit encoding 3-38 chapter 4 CW4011 exception processing 4.1 overview 4-1 4.2 r3000 exception compatibility mode 4-3 4.3 exception handling registers 4-4 4.3.1 context register 4-5 4.3.2 debug control and status (dcs) register 4-7 4.3.3 bad virtual address (badvaddr) register 4-9 4.3.4 count register 4-9 4.3.5 compare register 4-9 4.3.6 status register 4-10 4.3.7 cause register 4-18 4.3.8 exception program counter (epc) register 4-20 4.3.9 processor revision identi?er (prid) register 4-20 4.3.10 con?guration and cache control (ccc) register 4-22 4.3.11 load linked address (lladdr) register 4-26 4.3.12 breakpoint program counter (bpc) register 4-27 4.3.13 breakpoint data address (bda) register 4-27 4.3.14 breakpoint pc mask (bpcm) register 4-27 4.3.15 breakpoint data address mask (bdam) register 4-28 4.3.16 rotate register 4-28 4.3.17 circular mask (cmask) register 4-29 4.3.18 error exception program counter (error epc) register 4-30 4.4 exception description details 4-30
contents v 4.4.1 exception operation 4-31 4.4.2 precision of exceptions 4-34 4.4.3 exception vector locations 4-35 4.4.4 priority of exceptions 4-36 4.4.5 reset exceptions 4-36 4.4.6 interrupt exceptions 4-38 4.4.7 address error exception 4-41 4.4.8 tlb exceptions 4-42 4.4.9 bus error exception 4-46 4.4.10 integer over?ow exception 4-46 4.4.11 trap exception 4-47 4.4.12 system call exception 4-47 4.4.13 breakpoint exception 4-48 4.4.14 reserved instruction exception 4-49 4.4.15 floating-point exception 4-50 4.4.16 coprocessor unusable exception 4-50 4.4.17 debug exception 4-51 chapter 5 CW4011 memory management 5.1 tlb physical organization 5-1 5.2 memory management system 5-4 5.2.1 operating modes 5-4 5.2.2 user mode virtual addressing 5-5 5.2.3 kernel mode virtual addressing 5-6 5.3 virtual memory and the tlb 5-6 5.3.1 tlb entry format 5-7 5.3.2 tlb support registers 5-9 5.3.3 virtual address translation 5-15 5.3.4 tlb instructions 5-17 chapter 6 CW4011 caches 6.1 cache memory organization 6-1 6.2 cache states 6-2 6.2.1 i-cache and writethrough d-cache 6-2 6.2.2 writeback d-cache 6-3 6.3 address and cache tag 6-5 6.4 cache scratchpad ram mode 6-6
vi contents 6.5 external invalidation 6-7 6.6 cache instructions 6-7 6.6.1 flush (all cache invalidation) 6-8 6.6.2 writeback 6-8 6.6.3 cache maintenance by ccc register 6-9 chapter 7 CW4011 signals 7.1 CW4011 core signal interfaces 7-2 7.2 control interface 7-7 7.3 scbus interface 7-8 7.4 ocabus interface 7-13 7.5 coprocessor interface 7-16 7.6 cache invalidation interface 7-22 7.7 data cache interface 7-22 7.7.1 d-cache tag ram signals 7-22 7.7.2 d-cache data ram signals 7-24 7.8 instruction cache interface 7-26 7.8.1 i-cache tag ram signals 7-27 7.8.2 i-cache ram signals 7-28 7.8.3 i-cache least recently used (lru) ram signals 7-30 7.9 writeback buffer interface 7-31 7.10 memory management unit (mmu) interface 7-32 7.11 mmu to shell interface 7-41 7.12 multiply/divide unit (mdu) interface 7-41 7.13 miscellaneous signals 7-44 chapter 8 interface operation 8.1 reset and exception signals 8-1 8.1.1 cold reset (cresetn) 8-2 8.1.2 warm reset (wresetn) 8-3 8.1.3 nonmaskable interrupt (nmin) 8-4 8.1.4 bus error (scberrn) 8-7 8.1.5 external interrupts (extintn) 8-10 8.1.6 external vectored interrupt (exvintn) 8-12 8.1.7 waiti instruction and cpzstallp 8-14 8.2 scbus interface behavior 8-15 8.2.1 scbus basic transaction 8-16
contents vii 8.2.2 scbus burst transaction 8-18 8.2.3 scbus in-page write transaction 8-22 8.2.4 scbus bus hold 8-24 8.2.5 scbus bus retry 8-25 8.2.6 scbus bus error 8-26 8.2.7 scbus bus sizing 8-26 8.2.8 scbus bus lock 8-29 8.2.9 big-endian con?guration 8-30 8.3 ocabus interface behavior 8-34 8.3.1 basic ocabus transaction 8-34 8.3.2 ocabus transaction rejected 8-35 8.3.3 ocabus access with stall at ex stage 8-36 8.3.4 ocabus access with stall at cr stage 8-37 8.3.5 ocabus access with stall request 8-38 8.3.6 ocabus access with pipeline cancel 8-39 8.4 cache interface behavior 8-40 chapter 9 iceport 9.1 overview 9-1 9.2 iceport features 9-2 9.3 iceport functional blocks 9-3 9.3.1 receive and transmit interface logic 9-4 9.3.2 generic interface logic 9-4 9.3.3 scbus interface logic 9-4 9.4 iceport signals 9-5 9.4.1 monitored scbus signals 9-6 9.4.2 other scbus signals 9-7 9.4.3 iceport scan and clocking signals 9-8 9.5 iceport registers 9-9 9.5.1 rx status register 9-10 9.5.2 rx setup register 9-11 9.5.3 rx data register 9-11 9.5.4 tx status register 9-12 9.5.5 tx data register 9-12 9.6 iceport operations 9-13 9.6.1 scbus read/write transactions 9-13 9.6.2 reset 9-16
viii contents 9.6.3 serial bit stream 9-16 9.6.4 iceport receive and transmit 9-17 9.6.5 clock domains and properties 9-20 9.7 iceport pin buffers and drivers 9-21 chapter 10 speci?cations 10.1 physical speci?cations 9-1 10.2 ac timing and loading 9-1 appendix a CW4011 register summary a.1 CW4011 cpu registers a-1 a.2 register summary a-2 appendix b cache sizing and design concerns b.1 CW4011 i-cache con?gurations b-2 b.2 CW4011 i-cache interface b-3 b.3 i-cache shell b-3 b.4 i-cache set associative ram hookup b-4 b.4.1 2-kbyte i-cache set associative connections b-5 b.4.2 4-kbyte i-cache set associative connections b-7 b.4.3 8-kbyte, set associative cache, hookup b-9 b.4.4 16-kbyte i-cache set associative connections b-11 b.5 i-cache direct-mapped ram b-13 b.5.1 1-kbyte i-cache direct-mapped connections b-14 b.5.2 2-kbyte i-cache direct-mapped connections b-15 b.5.3 4-kbyte i-cache direct-mapped connections b-16 b.5.4 8-kbyte i-cache direct-mapped connections b-17 b.5.5 instruction ram b-18 b.6 CW4011 d-cache con?gurations b-19 b.7 CW4011 d-cache interface b-20 b.8 d-cache shell b-21 b.9 d-cache set associative ram hookup b-22 b.9.1 2-kbyte d-cache set associative, writeback connec- tions b-23 b.9.2 4-kbyte d-cache set associative, writeback connec- tions b-24 b.9.3 8-kbyte d-cache set associative, writeback connec-
contents ix tions b-26 b.9.4 16-kbyte d-cache set associative, writeback connec- tions b-27 b.9.5 2-kbyte d-cache set associative, writethrough con- nections b-28 b.9.6 4-kbyte d-cache set associative, writethrough con- nections b-30 b.9.7 8-kbyte d-cache set associative, writethrough con- nections b-31 b.9.8 16-kbyte d-cache set associative, writethrough con- nections b-33 b.10 d-cache direct-mapped ram hookup b-35 b.10.1 1-kbyte d-cache direct-mapped, writeback connec- tions b-36 b.10.2 2-kbyte d-cache direct-mapped, writeback connec- tions b-37 b.10.3 4-kbyte d-cache direct-mapped, writeback connec- tions b-38 b.10.4 8-kbyte d-cache direct-mapped, writeback connec- tions b-39 b.10.5 1-kbyte d-cache direct-mapped, writethrough connec- tions b-40 b.10.6 2-kbyte d-cache direct-mapped, writethrough connec- tions b-41 b.10.7 4-kbyte d-cache direct-mapped, writethrough connec- tions b-42 b.10.8 8-kbyte d-cache direct-mapped, writethrough connec- tions b-44 b.10.9 data scratchpad ram b-45 appendix c programmers notes c.1 instruction-related notes c-1 c.2 cp0 or tlbCrelated notes c-1 c.3 cache-related notes c-2 c.4 cw33300 compatible debug extension notes c-2 glossary
x contents customer feedback figures 1.1 CW4011 core and building blocks diagram 1-2 2.1 CW4011 block diagram 2-3 2.2 CW4011 instruction pipeline 2-6 3.1 instruction format 3-2 3.2 byte speci?cations for loads/stores 3-3 4.1 context register 4-6 4.2 dcs register 4-7 4.3 badvaddr register 4-9 4.4 count register 4-9 4.5 compare register 4-10 4.6 status register (r4000 mode) 4-11 4.7 status register (r3000 mode) 4-14 4.8 status register and exception recognition 4-17 4.9 cause register 4-18 4.10 epc register 4-20 4.11 prid register 4-20 4.12 ccc register 4-22 4.13 lladdr register 4-26 4.14 bpc register 4-27 4.15 bda register 4-27 4.16 bpcm register 4-28 4.17 bdam register 4-28 4.18 rotate register 4-29 4.19 cmask register 4-29 4.20 error epc register 4-30 4.21 cold reset exception 4-33 4.22 warm reset, nmi exceptions 4-33 4.23 common exceptions 4-33 4.24 debug exception 4-34 4.25 external vectored interrupt exception 4-34 5.1 tlb block diagram 5-2 5.2 CW4011 virtual memory map 5-5 5.3 CW4011 virtual address format 5-7
contents xi 5.4 format of CW4011 tlb entry 5-8 5.5 entryhi register 5-10 5.6 entrylo register 5-11 5.7 pagemask register 5-12 5.8 index register 5-13 5.9 random register 5-14 5.10 wired register location 5-14 5.11 wired register 5-15 5.12 CW4011 tlb address translation process 5-16 6.1 cache state diagrami-cache and writethrough d-cache 6-3 6.2 cache state diagramd-cache writeback 6-3 6.3 address to cache tag and line number 6-5 6.4 cache instruction format 6-7 6.5 tag test mode loaded data format 6-11 7.1 core interface connections 7-3 7.2 CW4011 logic diagram 7-4 8.1 cold reset and pipeline 8-2 8.2 nmin and pipeline (detected immediately) 8-5 8.3 nmin and pipeline (nmin is not detected immediately due to stall) 8-6 8.4 bus error and pipeline (detected immediately) 8-8 8.5 bus error and pipeline (with stall cycles) 8-9 8.6 interrupt and pipeline (detected immediately) 8-11 8.7 fastest accepted case of external vectored interrupt 8-13 8.8 waiti and pipeline stall (cpzstallp) 8-14 8.9 scbus basic transaction 8-17 8.10 scbus eight-word burst transaction timing chart 8-19 8.11 scbus eight-word burst transaction 8-21 8.12 scbus eight-word burst transaction 8-22 8.13 scbus in-page write transaction (four words) 8-24 8.14 scbus hold request and grant 8-25 8.15 sampled bytes of first and second transaction scbus data 8- 27 8.16 read bytes to isu and lsu with sizing 8-27 8.17 write bytes to the scbus with sizing 8-28 8.18 write data bytes from lsu 8-29 8.19 scbus locked transaction 8-30 8.20 typical ocabus transaction 8-35
xii contents 8.21 ocabus transaction rejected by address decoder 8-36 8.22 ocabus with stall at ex stage 8-37 8.23 ocabus access with stall at cr stage 8-38 8.24 ocabus access with stall request 8-39 8.25 ocabus access with pipeline cancel 8-40 8.26 d-cache invalidation by snooping 8-42 8.27 i-cache invalidation by snooping 8-43 9.1 CW4011 design with iceport 9-2 9.2 CW4011 iceport block diagram 9-3 9.3 iceport logic diagram 9-6 9.4 rx status register 9-10 9.5 rx setup register 9-11 9.6 rx data register 9-11 9.7 tx status register 9-12 9.8 tx data register 9-12 9.9 read operation 9-14 9.10 write operation 9-15 9.11 serial bit stream 9-17 9.12 rx and tx blocks 9-17 9.13 received bit timing 9-18 10.1 ac speci?cations 9-2 a.1 CW4011 cpu registers a-1 b.1 CW4011 i-cache shell rtl b-3 b.2 CW4011 d-cache shell rtl b-21 tables 2.1 CW4011 instruction set summary 2-11 2.2 instruction set extensions 2-15 3.1 load and store instruction summary 3-4 3.2 alu immediate instruction summary 3-6 3.3 three-operand, register-type instruction summary 3-7 3.4 shift instruction summary 3-8 3.5 multiply/divide instruction summary 3-9 3.6 computation instruction extensions summary (CW4011 isa) 3- 10 3.7 execution time of multiply and divide instructions 3-11 3.8 jump instruction summary 3-12
contents xiii 3.9 branch instruction summary 3-13 3.10 branch likely instruction summary (mips ii isa extensions) 3- 14 3.11 trap instruction summary (mips ii isa extensions) 3-15 3.12 special instruction summary 3-16 3.13 coprocessor instruction summary 3-17 3.14 cp0 instruction summary 3-18 3.15 cache maintenance instruction summary 3-20 3.16 CW4011 instruction set extensions 3-21 3.17 CW4011 opcode bit encoding 3-39 3.18 special opcode bit encoding 3-39 3.19 regimm opcode rt bit encoding 3-40 3.20 cache x2 opcode rt bit encoding 3-40 3.21 copz rs opcode bit encoding 3-40 3.22 copz rt opcode bit encoding 3-41 3.23 cp0 opcode bit encoding 3-41 4.1 CW4011 exceptions 4-2 4.2 cp0 exception processing registers 4-4 4.3 cause register exccode field 4-19 4.4 current processor mode 4-32 4.5 exception vector base addresses 4-35 4.6 exception vector offset addresses 4-35 4.7 exception priority order 4-36 5.1 i-cache algorithm criteria 5-3 5.2 d-cache algorithm criteria 5-3 5.3 tlb support registers 5-9 5.4 tlb instruction 5-17 6.1 d-cache writeback mode 6-3 6.2 setting cache size 6-5 6.3 scratchpad ram enables 6-6 6.4 ccc bits related to cache con?guration 6-9 6.5 tag and inv encoding 6-10 6.6 tag and inv encoding 6-10 8.1 common exception vector 8-9 8.2 scbus transaction types 8-16 8.3 big-endian arbitrary signal names 8-30 8.4 big-endian valid bytes 8-31 8.5 data bus and byte enable connections 8-32
xiv contents 8.6 CW4011 accesses through off-core buses 8-33 9.1 iceport registers 9-9 10.1 CW4011 physical layout size 9-1 10.2 CW4011 timing considerations 9-2 10.3 CW4011 input ac timing and loading 9-3 10.4 CW4011 output ac timing and loading 9-4 a.1 cp0 exception processing registers a-2 b.1 CW4011 i-cache sizes b-2 b.2 set associative, i-cache ram requirements b-4 b.3 direct-mapped, writeback, i-cache ram requirements b-13 b.4 CW4011 d-cache sizes b-19 b.5 d-cache data interleaving b-20 b.6 set associative, writeback, d-cache ram requirements b-22 b.7 set associative, writethrough, d-cache ram requirements b- 22 b.8 direct-mapped, writeback, d-cache ram requirements b-35 b.9 direct-mapped, writethrough, d-cache ram requirements b- 35
preface xv preface this book is the primary reference and technical manual for the minirisc CW4011 superscalar microprocessor core, referred to in this document as the CW4011 or as the core. this book contains a complete functional description of the CW4011. audience this book is intended for use by engineers and managers who are evaluating the CW4011 core, or for engineers who are designing with this core. this book assumes that this audience is familiar with the concepts of microprocessors and related support devices. organization this book has the following chapters and a glossary of terms. chapter 1, introduction , provides an overview of the CW4011 core and describes the features of the lsi logic coreware ? program. chapter 2, architectural overview , describes the cpu pipeline and microarchitecture, the instructions set architecture, the system copro- cessor (cp0), memory management, exception processing, and cache maintenance. chapter 3, instruction set , describes the mips r-series instructions and the instruction set extensions supported in the CW4011 core. chapter 4, CW4011 exception processing , describes how the CW4011 handles exception processing. chapter 5, CW4011 memory management , provides detailed infor- mation about cp0 and the CW4011 memory management system.
xvi preface chapter 6, CW4011 caches , provides detailed information about the CW4011 caches and cache maintenance. chapter 7, CW4011 signals , describes the CW4011 core i/o signals. chapter 8, interface operation , describes the main timing scenarios for CW4011 transactions. chapter 9, iceport , describes the CW4011 serialice port building block. chapter 10, speci?cations , contains physical speci?cations and ac timing for the CW4011 core. appendix a, CW4011 register summary , provides an overview of all core registers and general mips register architecture. appendix b, cache sizing and design concerns , provides infor- mation about connecting and selecting different CW4011 cache sizes. appendix c, programmers notes , provides information that is use- ful if you are writing software for the CW4011 core. related publications cw33300 enhanced self-embedding processor core users manual , lsi logic corporation, order no. c14014 lr4500 superscalar microprocessor technical manual , lsi logic corporation, document no. db14-000068-00. bdmr4011 evaluation board users guide , lsi logic corporation, document no. db15-000055-00. mips serialice? users guide , available from lsi logic corporation. conventions used in this manual the term word is used to de?ne a 32-bit quantity, either signed or unsigned. this means that in the CW4011 core a word consists of four 8-bit bytes; a doubleword has 64 bits, or eight 8-bit bytes; and a halfword has 16 bits, or two 8-bit bytes. hexadecimal numbers are indicated by the pre?x 0x before the number, for example, 0x32cf. binary numbers are indicated by the pre?x 0b before the number, for example, 0b0111 0011 0000.
preface xvii the following signal conventions are used throughout the manual: active-low signals have a lowercase n at the end of the signal name (for example, resetn). active-high signals have a lowercase p at the end of the signal name (for example, scaop). the term assert means to drive a signal true or active. the term deassert means to drive a signal false or inactive. revision history this table details the changes in this manual over the documents history. it is not intended to re?ect all changes, but should be used as a revision overview. document version release date comments preliminary july 1997 initial release. this document was generated from lsi logics minirisc ? cw4010 superscalar microprocessor core technical manual . final october 1997 final release for revision a. added chapter 10, speci?ca- tions, and revised appendix b, cache sizing and design concerns, to re?ect the proper CW4011 cache connec- tions. minor modi?cations made to other chapters.
xviii preface
1-1 chapter 1 introduction this chapter introduces the lsi logic coreware program and describes its features. it also provides an overview of the CW4011 core. this chapter contains the following sections: section 1.1, CW4011 overview section 1.2, CW4011 core and building blocks section 1.3, features section 1.4, coreware program 1.1 CW4011 overview lsi logic corporation has developed the mips iiCcompatible minirisc CW4011 superscalar core using lsi logics coreware system-on-a-chip methodology. the CW4011 is a member of lsi logics minirisc family, the next generation of mips risc products. you can use the superscalar CW4011 as a microprocessor core in products that require higher performance than that of an lsi logic cw400x microprocessor core. the CW4011 core is available as a coreware product for use in customer asic designs, and is also used in lsi logics assps (application speci?c standard products), such as the atmizer ? ii atm-sar chip.
1-2 introduction 1.2 CW4011 core and building blocks as shown in figure 1.1 , the CW4011 is implemented at two levels: the standard CW4011 core and the optional shell. figure 1.1 CW4011 core and building blocks diagram scbus cache invalidation interface interface multiply/ divide alu simple writeback buffer d-cache set-0 d-cache set-1 cp0 isu lsu biu unit i-cache set-0 i-cache set-1 mmu local i/o bus memory bus (lbus) (mbus) synchronous controller (sdramc) dram scbus/lbus converter (sclc) CW4011 shell iceport (uart) oca interface coprocessor interface CW4011 core
CW4011 core and building blocks 1-3 1.2.1 CW4011 core the CW4011 superscalar microprocessor core is an encrypted synthesizable verilog (or vhdl) model. it is process independent and made up of the following units: arithmetic logic unit (alu) system control coprocessor (cp0) bus interface unit (biu) load store unit (lsu) instruction scheduler unit (isu) 1.2.2 CW4011 shell the following microprocessor building blocks are available with the basic microprocessor core and are available as part of the CW4011 shell. the shell is an unencrypted verilog model that can include: direct-mapped or two-way set associative instruction cache (i-cache) with cache sizes selectable up to 16 kbytes direct-mapped or two-way set associative data cache (d-cache) with cache sizes selectable up to 16 kbytes simple memory management unit (mmu) with an optional translation lookaside buffer (tlb) standard multiply/divide unit (mdu) or a high-performance multiply/accumulate unit (mac) writeback buffer for writeback cache mode 1.2.3 other CW4011 components the following components are typically included for any CW4011 design and are implemented in the lr4500 reference device. iceport (uart) for serialice? hardware and software debugging support. a serialice manual is available from lsi logic upon request. scbus/lbus (sclc) converter for off-chip components.
1-4 introduction synchronous dram controller (sdramc) to interface the core scbus with off-chip memory. please note the following two considerations for any CW4011 microprocessor core design: lsi logic provides the sdramc and sclc modules as source code only. lsi logic does not supply product support or documentation for these optional building block modules. the mmu and mdu components are considered part of the building blocks shell and are not open to users. to modify either the mdu or mmu for your design, please contact lsi logic. 1.2.4 interfaces the core has four major interfaces: the coprocessor interface connects the core with up to three coprocessors (cp1, cp2, and cp3), as well as the internal coprocessor (cp0). the cache invalidation interface connects the core with optional cache coherency logic. the core uses this bus to communicate only with the on-chip caches. the scbus, the bidirectional system bus, allows the CW4011 to communicate with system elements outside the core, such as the sclc and the sdramc. the ocabus interface allows on-chip access (oca) to on-chip modules at the cache read pipeline stage without going through the scbus. 1.2.5 related modules in addition to the core, the minirlsc product family includes a variety of other modules including: lsi logics minislm ? performance simulator verilog and vhdl models a system veri?cation environment (sve) a prom monitor
features 1-5 third party software support (compiler and rtos) lr4500 evaluation chip evaluation boards for concurrent software development lsi logics coreware, described in section 1.4, coreware program 1.3 features the CW4011 core has the following features: full mips ii instruction set implementation (r4000 32-bit mode compatible) instruction set extensions to support embedded applications superscalar execution with up to two instructions issued per clock cycle 64-bit on-chip data bus system interface high-performance coprocessor interface for user de?nable coprocessors and high performance hardware ?oating-point unit (fpu) 3.3-volt operation 90-mhz worst-case commercial maximum clock rate using standard cell asic 130+ dhrystone mips at 90 mhz 180 native mips peak, 120 native mips sustained with standard compiled mips code at 90 mhz 7.0 mw/mhz core power with power management integrated cache controllers with separate instruction and data caches C d-cache set sizes selectable from 1 to 8 kbytes (up to two sets available) C i-cache set sizes selectable from 1 to 8 kbytes (up to two sets available) optional, modi?able building blocks, such as a mac and an mmu
1-6 introduction serialice scan chain allows full testing in embedded asic designs models available: C performance and software development model C verilog and vhdl models (referred to in this manual as hdl models) C gate-level, timing-accurate model in various third party simulation environments compatible with the full range of mips and third party software development tools compact basic microprocessor core size2.5 by 3.5 mm including biu, cache controllers, and external write buffer r3000 compatibility mode for exception handling and status registers 1.4 coreware program an lsi logic core is a fully de?ned, optimized, and reusable block of logic. it supports industry-standard functions and has prede?ned timing and layout. the core is also an encrypted rtl simulation model for a wide range of vhdl and verilog simulators. the coreware library contains an extensive set of complex cores for the communications, consumer, and computer markets. the library consists of high-speed interconnect functions such as the gigablaze? g10? core, mips embedded microprocessors, mpeg-2 decoders, a pci core, and many more. the library also includes megafunctions or building blocks, which provide useful functions for developing a system on a chip. through the coreware program, you can create a system on a chip uniquely suited to your applications. each core has an associated set of deliverables, including: rtl simulation models for both verilog and vhdl environments an sve for rtl-based simulation netlists for full timing simulation complete documentation
coreware program 1-7 lsi logic toolkit support lsi logics toolkit provides seamless connectivity between products from leading electronic design automation (eda) vendors and lsi logics manufacturing environment. standard interfaces for formats and languages such as vhdl, verilog, waveform generation language (wgl), physical design exchange format (pdef), and standard delay format (sdf) allow a wide range of tools to interoperate within the lsi toolkit environment. in addition to design capabilities, full scan automatic test pattern generation (atpg) tools and lsi logics specialized test solutions can be combined to provide high-fault coverage test programs that assure a fully functional design. because your design requirements are unique, lsi logic is ?exible in working with you to develop your system-on-a-chip coreware design. three different work relationships are available: you provide lsi logic with a detailed speci?cation and lsi logic performs all design work. you design some functions while lsi logic provides you with the cores and megafunctions, and lsi logic completes the integration. you perform the entire design and integration, and lsi logic provides the core and associated deliverables. whatever the work relationship, lsi logics advanced coreware methodology and asic process technologies consistently produce right-first-time ? silicon.
1-8 introduction
2-1 chapter 2 architectural overview this chapter discusses the cpu pipeline, cpu architecture, instruction set architecture, the system coprocessor (cp0), memory management, exception processing, and cache maintenance. this chapter is divided into the following sections: section 2.1, architectural overview section 2.2, cache and external interface section 2.3, clocking and power management section 2.4, pipeline architecture section 2.5, instruction set summary section 2.6, con?gurability and options 2.1 architectural overview the CW4011 is fully compatible with the r3000 and r4000 32-bit instruction sets (mips i and mips ii), but it also uses an updated hardware architecture to provide higher absolute performance than any other available mips core. the CW4011 also provides substantially better instructions-per-clock performance than other mips processors. at the same time, the hardware design remains compact in comparison with similar superscalar architectures. the CW4011 implements a 32-bit virtual address space, with up to 2 gbytes of virtual address space available to each user-level process. individual memory locations are byte-addressed. the CW4011 implements a 32-bit physical address space. individual memory locations are byte-addressed and, combined with the virtual address space, provide a total of four gbytes of physical address memory.
2-2 architectural overview the CW4011 can issue and complete two instructions per cycle using a combination of ?ve independent execution units: arithmetic logic unit (alu) load/store unit (lsu) lsu executes load and store instructions. it also executes add and load immediate instructions, allowing an add instruction to be issued with another add or logical instruction. branch unit multiply/shift unit coprocessor interface coprocessor interface can feed an instruction to a customer-de?ned coprocessor unit. contact lsi logic for further information if your design requires a coprocessor. all instructions, except multiply and divide, can be completed in a single cycle. load instructions have a single hardware delay slot for loads that hit in the cache, but the hardware activates an interlock on register con?icts so that a nop (no operation) is not required in the delay slot. on a load miss, the CW4011 extends the hardware con?ict detection so that if the load data is not required by subsequent instructions in the pipeline, the cpu is not stalled. the operation is called load scheduling . figure 2.1 shows a block diagram of the basic CW4011 core. three units handle instructions: the ifetch queue optimizes the supply of instructions to the microprocessor, even across breaks in the sequential ?ow of execution (jumps and branches). the idecode unit decodes the instructions from the ifetch queue, determines the actions required for the instruction execution, and manages the register file, lsu, alu, and multiply/divide units accordingly. the branch unit is used when branch and jump instructions are recognized within the instruction stream.
architectural overview 2-3 figure 2.1 CW4011 block diagram the register file contains the cores general purpose registers. (there are 32 general purpose registers located in the cpu. of these registers 31 are read/write registers and 1 is the zero register.) the register file supplies source operands to the execution units and handles the storage of results to target registers. three units perform logical, arithmetic, and data-movement operations: the lsu manages loads and stores of data values. data values are loaded from either the d-cache or from the scbus interface in the event of a d-cache miss. stores pass to the d-cache and the scbus interface through the write buffer. the lsu is also able to perform a restricted set of arithmetic operations, including the addition of an immediate offset as required in address calculations. the alu calculates the result of an arithmetic or logical operation. coprocessor 0 ifetch queue i-cache coprocessor interface register file load/store unit (lsu) arithmetic multiply/shift d-cache write bus interface unit (biu) address data idecode unit branch unit control scbus interface internal instruction execution bus x 2 32 64 64 logic unit interface cache invalidation interface multiply/divide unit ocabus interface 64 instruction schedule unit (isu) (alu) buffer unit
2-4 architectural overview the multiply/shift interface unit performs multiply and divide operations. you can select a number of modular options for this unit, including an option with full multiply/accumulate capability. the CW4011 core has four major interfaces for data transfer: the biu manages the ?ow of instructions and data between the core and the system by means of the scbus interface. this interface provides the main channel for communication between the CW4011 core and the other functional blocks in the system. some blocks may be implemented as coreware library functions integrated on the same die as the microprocessor core; others may be implemented in separate devices connected by means of i/o pins at the board level. the coprocessor interface allows tightly coupled special-purpose processing units to be attached to the core, enhancing the microprocessors general-purpose computational power. contact lsi logic for further information if you need a coprocessor in your design. the cache invalidation interface allows supporting hardware outside the core to maintain the coherency of on-board cache contents for systems that include multiple main-bus masters. the ocabus interface allows on-chip modules to be accessed at the cache read (cr) stage of the pipeline without going through an scbus transaction. this improves performance since it reduces traf?c on the scbus and therefore reduces latency. 2.2 cache and external interface i-cache control is performed by the isu. d-cache control is performed by the lsu. a write buffer is also implemented within the lsu, so that cpu execution need not stall if a number of stores are performed in quick succession. the write buffer accepts the store addresses and data values, and passes them on to main memory as rapidly as it can accept them. during this time, the cpu proceeds with execution. the biu provides the interface to on-chip peripherals. one or more peripherals will typically provide a path to off-chip resources, including
clocking and power management 2-5 main memory. the biu supports dynamic bus sizing between 32-bit and 64-bit transactions, see section 2.6.3, 64-bit vs. 32-bit memory interface, for more information on bus sizing. the on-chip system interface presented by the biu is the scbus. this bus has a 64-bit data bus and a 32-bit address bus. address and data are not multiplexed. i-cache and d-cache re?lls use the 64-bit data bus to achieve the highest performance possible. 2.3 clocking and power management the cpu core is clocked by a single phase, 1x clock with a 40C60% duty cycle requirement. applications that require a slower system clock interface may use a phase-locked loop circuit (pll), available as a cell in lsi logics asic libraries, and logic to implement a clock multiplier circuit for the cpu. power management is provided for the cpu by the waiti (wait for interrupt) instruction and by gating the clock separately for each functional unit. units are clocked only when needed. in addition, the core and cache rams are static, so that the clock may be slowed or turned off by user logic to save power. 2.4 pipeline architecture this section describes the cpu pipelines, instruction fetching and scheduling. it also contains an instruction set summary. as shown in figure 2.2 , the CW4011 core has two identical concurrent ?ve-stage pipelines that provide the core with its superscalar capabilities. one pipeline is known as the even slot or pipeline 0, and the other as the odd slot or pipeline 1.
2-6 architectural overview figure 2.2 CW4011 instruction pipeline the ?rst two pipeline stages (and conditional q stage) are used during instruction fetch and the last three stages during instruction execution. once a stage has accepted an instruction from the previous stage it must hold the instruction for re-execution in case the pipeline stalls. the function of each pipeline stage is summarized below. if (instruction fetch) C the CW4011 fetches the instruction during the ?rst stage. q (queuing) C instructions may enter this conditional stage if they deal with execution unit or register con?icts. an instruction that does not cause an execution unit or register con?ict is fed directly to the rd stage. rd (read) C during this stage, any required operands are read from the register file while the instruction is decoded. ex (execute) C all instructions are executed in this stage. conditional branches are resolved in this cycle. the address calculation for load and store instructions is performed in this stage. cr (cache read) C this stage is used to access the cache for load and store instructions. data is returned to the register bypass logic at the end of this stage. wb (writeback) C results are written into the register file during this stage. ex cr rd q if instruction fetch instruction execution wb ex cr rd q if wb even slot, odd slot, pipeline 0 pipeline 1
pipeline architecture 2-7 sections 2.4.1 and 2.4.2 provide more detailed information about pipeline transactions. 2.4.1 instruction fetch and scheduling the if, q, and rd stages fetch two instructions per cycle and issue them to the ex stage. the CW4011 fetches instructions as doubleword aligned pairs (even and odd). there is a two-instruction window in the rd stage during the instruction decode operation. when only the even slot can be scheduled because the odd slot has a dependency, the window slides down one instruction. in other words, although instructions are always fetched as doubleword pairs, they are scheduled on single-word boundaries. the primary purpose of the q stage is to execute branch instructions with minimal penalty. the CW4011 generally ?lls the q stage whenever the rd stage has to stall. this occurs fairly frequently on typical compiled code, because of register con?icts, cache misses, and resource con?icts. filling the q stage in these cases allows the if stage to work ahead one cycle. if a branch instruction is encountered when the q stage is already active, it is predicted that the branch will be taken. the if stage does not bring in any more instructions following the current address, but instead begins fetching those instructions starting at the branch target address. at this point, the q stage still holds the pair of instructions immediately following the pair that contained the branch. the branch target enters the rd stage, bypassing the q stage, as shown in figure 2.2 . the branch prediction logic in the isu resolves the branch condition when the branch instruction enters the ex stage. if the branch prediction logic predicts the branch correctly, the instructions in the q stage are cancelled. if it predicts the branch incorrectly, the isu cancels the branch target. in this case, it takes non-branch sequential instructions from the q stage and restarts the if stage at the non-branch sequential stream. the process is different when the branch instruction is in the odd instruction slot. if the branch prediction logic correctly predicts a branch in the even instruction slot when the q stage is full, there is generally no cycle penalty associated with it. if the branch prediction logic predicts the branch incorrectly, the branch has a one cycle penalty.
2-8 architectural overview if the branch instruction was in the odd instruction slot, the branch delay slot instruction always executes by itself and has no chance to ?ll the other execution slot. there may be some advantage to a software assembler that can attempt to place branches in even word addresses. the branch prediction logic must be able to look at two instructions at the same time, from either the q latches or the rd latches, depending on whether the q stage is active. when it looks at the two instructions, if one is a branch, it passes the offset in that instruction into a dedicated adder to calculate the branch address for the if stage of the instruction fetch. because this is done speculatively, it also saves the non-branch value of the pc (program counter) for the possible restart of the sequential instructions from the q stage. after the isu has allowed an instruction pair to pass into the rd stage, the instruction is decoded, and at the same time the register source addresses are passed to the register ?le so that the operands can be read. register dependencies and resource dependencies are checked in this stage. if the instruction in the even slot has no dependency on a register or resource currently tied up by a previous instruction, it is passed immediately into the ex stage where it forks to the appropriate execution unit. the instruction in the odd slot may also be dependent on a resource or register in the even slot, so it must be checked for dependencies against both the even slot and any previous unretried instruction. if either instruction must be held in the rd stage and the q stage is not full, the if stage is allowed to continue to ?ll the q stage. if the q stage is full, then the q and if stages are frozen (stalled). in the rd stage, register bypass opportunities are considered and the bypass multiplexer control signals are set for potential bypass cases from a previous instruction still in the pipeline. 2.4.2 instruction execution during instruction execution, a pair of instructions (or a single instruction when there was a previous block) are individually passed to independent execution units. each execution unit receives its operands from the register bypass logic and an instruction from the instruction scheduler. each single cycle instruction spends one run cycle in an execution unit, with the result then fed to the register/bypass unit for the cr stage. please note that multiple cycle instructions may spend longer than one cycle in an execution unit.
instruction set summary 2-9 for load and store instructions, the cache lookup occurs during the cr stage. for load instructions, data is returned to the register/bypass unit during the cr stage, including loads to coprocessors. for all other instructions, cr and wb are holding stages used to hold the result of the execute stage for writeback to the register ?le. 2.5 instruction set summary table 2.1 summarizes the instruction set for the CW4011. the CW4011 supports 32-bit mips ii instructions and implements additional CW4011-speci?c instructions. if the design includes the optional mmu, the CW4011 supports the tlb instructions. all instructions are 32 bits long. table 2.1 includes only the mips ii, CW4011-speci?c, and tlb
2-10 architectural overview instructions. with the exception of rfe, mips i instructions are not shown.
instruction set summary 2-11 table 2.1 CW4011 instruction set summary op description op description load/store instructions lb load byte sh store halfword lbu load byte unsigned sw store word lh load halfword swl store word left lhu load halfword unsigned swr store word right lw load word ll 1 load linked lwl load word left sc 1 store conditional lwr load word right sync 1 sync sb store byte alu immediate instructions addi add immediate andi and immediate addiu add immediate unsigned ori or immediate slti set on less than immediate xori exclusive or immediate sltiu set on less than immediate unsigned lui load upper immediate three-operand, register-type arithmetic instructions add add sltu set on less than unsigned addu add unsigned and and sub subtract or or subu subtract unsigned xor exclusive or slt set on less than nor nor shift instructions (sheet 1 of 4)
2-12 architectural overview sll shift left logical sllv shift left logical variable srl shift right logical srlv shift right logical variable sra shift right arithmetic srav shift right arithmetic variable multiply/divide instructions mult multiply mfhi move from hi multu multiply unsigned mthi move to hi div divide mflo move from lo divu divide unsigned mtlo move to lo computation instruction extensions addciu 2 add circular immediate selsr 2 select and shift right ffs 2 find first set bit selsl 2 select and shift left ffc 2 find first clear bit madd 2 multiply/add min 2 minimum maddu 2 multiply/add unsigned max 2 maximum msub 2 multiply/subtract msubu 2 multiply/subtract unsigned jump and branch instructions j jump blez branch on less than or equal to zero jal jump and link bgtz branch on greater than zero jr jump register bltz branch on less than zero jalr jump and link register bgez branch on greater than or equal to zero beq branch on equal bltzal branch on less than zero and link bne branch on not equal bgezal branch on greater than or equal to zero and link table 2.1 CW4011 instruction set summary (cont.) op description op description (sheet 2 of 4)
instruction set summary 2-13 branch likely instructions beql 1 branch on equal likely bgezl 1 branch on greater than or equal to zero likely bnel 1 branch on not equal likely bltzall 1 branch on less than zero and link likely blezl 1 branch on less than or equal to zero likely bgezall 1 branch on greater than or equal to zero and link likely bgtzl 1 branch on greater than zero likely bcztl 1 branch on coprocessor z true likely bltzl 1 branch on less than zero likely bczfl 1 branch on coprocessor z false likely trap instructions teq trap on equal tlt trap on less than teqi trap on equal immediate tlti trap on less than immediate tge trap on greater than or equal tltu trap on less than unsigned tgei trap on greater than or equal immediate tltiu trap on less than immediate unsigned tgeu trap on greater than or equal unsigned tne trap if not equal tgeiu trap on greater than or equal imme- diate unsigned tnei trap if not equal immediate special instructions syscall system call break breakpoint coprocessor instructions lwcz load word to coprocessor z cfcz move control from coprocessor z swcz store word from coprocessor z copz coprocessor operation mtcz move to coprocessor z bczt branch on coprocessor z true table 2.1 CW4011 instruction set summary (cont.) op description op description (sheet 3 of 4)
2-14 architectural overview in addition to the standard mips ii instruction set, the CW4011 implements certain instruction set extensions, shown in table 2.2 , that provide greater application code performance for typical embedded applications. instruction set extensions are included only if they signi?cantly improve performance, have no impact on clock cycle rate, and have minimal impact on the size and complexity of the hardware. coprocessor instructions (continued) mfcz move from coprocessor z bczf branch on coprocessor z false ctcz move control to coprocessor z system control coprocessor (cp0) instructions mtc0 move to cp0 tlbwi 3 write indexed tlb entry mfc0 move from cp0 tlbwr 3 write random tlb entry rfe restore from exception (r3000 mode only) tlbp 3 probe tlb for matching entry eret exception return (r4000 mode only) waiti 2 wait for interrupt tlbr 3 read indexed tlb entry cache maintenance instructions flushd 2 , 4 flush d-cache flushid 2 , 4 flush i-cache and d-cache flushi 2 , 4 flush i-cache wb 2 writeback 1. mips ii instruction. 2. CW4011-speci?c instruction. 3. valid only with implemented mmu building block. 4. do not confuse these instructions with the flush instruction in r6000 processors. table 2.1 CW4011 instruction set summary (cont.) op description op description (sheet 4 of 4)
instruction set summary 2-15 table 2.2 instruction set extensions extension format and description find first set, find first clear ffs rd,rs, ffc rd,rs these instructions, respectively, ?nd the ?rst set bit and the ?rst clear bit in the source register, and return the bit number to the destination register. they are useful for many applications such as interrupt handlers, ?oating point emulation, and graphics. select and rotate left, select and rotate right selsl rd,rs,rt, selsr rd,rs,rt these instructions select 32 bits from the 64-bit source register pair and rotate the selected data left or right by the number of bits speci?ed in the new cp0 rotate register. they are useful for data alignment operation in graphics and in bit-?eld selection routines for data transmission and com- pression applications. add circular immediate addciu rt,rs,immediate this instruction does an immediate add, modi?ed according to the value in the new cp0 cmask register. it is useful in addressing circular buffers. this instruction is important in dsp (digital signal processing) and other applica- tions that use circular buffers. multiply/add, multiply/sub instructions madd(u) rs,rt, msub(u) rs,rt these instructions are useful in many signal processing and graphics trans- form algorithms. only implemented with the high-performance multiply/accumulate unit, these instructions do a 32 x 32 multiply and then either add or subtract the result to the 64-bit hi/lo register pair. wait for interrupt waiti this instruction halts the cpu in a power saving mode until one of the hard- ware interrupt lines becomes active. upon interrupt, normal execution is resumed starting at the interrupt vector address. minimum min rd, rs, rt the source operands rs and rt are compared as twos complement values. the smaller value is stored in the rd register. maximum max rd, rs, rt the source operands rs and rt are compared as twos complement values. the larger value is stored in the rd register.
2-16 architectural overview 2.6 con?gurability and options the CW4011 is implemented using verilog hdl (hardware description language) as the design source, and the lsi logic standard cell library and layout tools for physical design. you can easily modify and con?gure the CW4011 core to meet speci?c design requirements. the options available in the basic core are shown in the following sections. please note that vhdl models are also available. 2.6.1 cache sizes the instruction cache sizes available are 0C16 kbytes, direct mapped or two-way set associative. the data cache sizes available are 0C16 kbytes, direct mapped or two-way set associative. see chapter 6, CW4011 caches, and appendix b, cache sizing and design concerns, for more information about cache sizing. 2.6.2 high-performance multiply accumulate unit each project may choose a high-performance multiply unit that provides base r3000 and r4000 multiply instructions (with similar performance) and madd and msub instructions. the high-performance multiplier is intended for applications with substantial multiply/accumulate performance needs. it includes a 32 x 32 pipelined array multiplier and a 64-bit accumulator that can retire a multiply or multiply/accumulate instruction every clock cycle with a latency of three clock cycles per result. 2.6.3 64-bit vs. 32-bit memory interface the CW4011 biu supports a 32-bit sizing interface for cost-sensitive designs or applications with low memory bandwidth. the biu can be modi?ed to present a 32-bit data bus instead of a 64-bit data bus.
con?gurability and options 2-17 2.6.4 memory management unit the CW4011 is designed to support the 32-bit addressing mode of the r4000 mmu. the tlb that is available in the base processor design contains up to 32 single-page entries. each page can be individually speci?ed to be 4 kbytes or 16 mbytes. the CW4011 can support a simple mmu for designs that do not require a full mmu implementation. for designs with no tlb requirements, the tlb can also be removed to save silicon.
2-18 architectural overview
3-1 chapter 3 instruction set this chapter presents an overview of the mips r-series instructions and the instruction set extensions supported in the CW4011. this chapter contains the following sections: section 3.1, instruction set formats section 3.2, load and store instructions section 3.3, computational instructions section 3.4, jump and branch instructions section 3.5, trap instructions section 3.6, special instructions section 3.7, coprocessor instructions section 3.8, system control coprocessor (cp0) instructions section 3.9, cache maintenance instructions section 3.10, CW4011 instruction set extensions section 3.11, cpu instruction opcode bit encoding 3.1 instruction set formats every r-series instruction consists of a single word (32 bits) aligned on a word boundary. as shown in figure 3.1 , there are three instruction formats: i-type (immediate), j-type (jump), and r-type (register). the restricted format approach simpli?es instruction decoding. the compiler and assembler can synthesize more complicated (and less frequently used) operations and addressing modes.
3-2 instruction set figure 3.1 instruction format 3.2 load and store instructions load and store instructions are all i-type instructions and move data between memory and general purpose registers. the only addressing mode directly supported in the base r-series architecture is base register plus 16-bit signed immediate offset . the mips ii extensions add the load linked and store conditional instructions, which support multiple processors, and the sync instruction, which synchronizes loads and stores. the CW4011 supports these instructions. the load/store instruction operation code (opcode) determines the access type, which in turn indicates the size of the data item to be loaded or stored. regardless of access type or byte-numbering order (big-endian or little-endian), the address speci?es the byte that has the smallest byte address of all the bytes in the addressed ?eld. for a 0 op i-type (immediate) immediate op rs rt j-type (jump) 31 26 25 21 20 16 15 0 31 26 25 target r-type (register) 0 31 26 25 21 20 16 15 11 10 6 5 op rs rt rd shamt funct op 6-bit operation code rs 5-bit source register speci?er rt 5-bit target (source/destination register) immediate 16-bit immediate, branch displacement, or address displacement target 26-bit jump target address rd 5-bit destination register speci?er shamt 5-bit shift amount funct 6-bit function ?eld
load and store instructions 3-3 big-endian machine, the smallest byte is the leftmost byte; for a little- endian machine, it is the rightmost byte. the bytes used within the addressed word can be determined directly from the access type and the two low-order bits of the address, as shown in figure 3.2 . note that certain combinations of access type and low-order address bits can never occur; only the combinations shown in figure 3.2 are allowed. figure 3.2 byte speci?cations for loads/stores 111 110 data bus 63 0 63 0 msb lsb lsb msb byte numbers byte numbers low-order address bits a2 a1 a0 access type doubleword word tribyte halfword byte bytes accessed bytes accessed 000 001 010 011 100 101 000 010 100 110 000 001 100 101 000 000 little-endian big-endian 76543210 100 7 6 5 4 3 2 1 0
3-4 instruction set table 3.1 describes the load and store instructions supported by the CW4011. instruction format is shown in courier; for example, lb rt, offset(base) . table 3.1 load and store instruction summary instruction format and description load byte lb rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. sign-extend the contents of addressed byte and load into rt . load byte unsigned lbu rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. zero-extend the contents of addressed byte and load into rt . load halfword lh rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. sign-extend contents of addressed halfword and load into rt . load halfword unsigned lhu rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. zero-extend contents of addressed halfword and load into rt . load word lw rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address, and load the addressed word into rt . load word left lwl rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift addressed word left so that addressed byte is leftmost byte of a word. merge bytes from memory with contents of register rt and load result into register rt . load word right lwr rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift addressed word right so that addressed byte is rightmost byte of a word. merge bytes from memory with contents of register rt and load result into reg- ister rt . store byte sb rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store least-signi?cant byte of register rt at addressed location. store halfword sh rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store least-signi?cant halfword of register rt at addressed location. (sheet 1 of 2)
computational instructions 3-5 3.3 computational instructions computational instructions perform arithmetic, logical, and shift operations on values in registers. computational instructions occur in both r-type (both operands are registers) and i-type (one operand is a 16-bit immediate) formats. there are ?ve categories of computational instructions: table 3.2 summarizes the alu immediate instructions table 3.3 summarizes the three-operand, register-type instructions table 3.4 summarizes the shift instructions store word sw rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store contents of register rt at addressed location. store word left swl rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift contents of register rt left so that the leftmost byte of the word is in the position of the addressed byte. store word containing shifted bytes into word at addressed byte. store word right swr rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift contents of register rt right so that the rightmost byte of the word is in the position of the addressed byte. store word containing shifted bytes into word at addressed byte. load linked ll rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address, and load the addressed word into register rt . store conditional sc rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. conditionally store register rt at address, based on whether the load- link has been broken. sync sync complete all outstanding load and store instructions before allowing any new load or store instruction to start. table 3.1 load and store instruction summary (cont.) instruction format and description (sheet 2 of 2)
3-6 instruction set table 3.5 summarizes the multiply/divide instructions table 3.6 summarizes the computational CW4011 instruction extensions (CW4011 isa) table 3.2 alu immediate instruction summary instruction format and description add immediate addi rt, rs, immediate add 16-bit, sign-extended immediate to register rs and place 32-bit result in register rt . trap on twos complement over?ow. add immediate unsigned addiu rt, rs, immediate add 16-bit, sign-extended immediate to register rs and place 32-bit result in register rt . do not trap on over?ow. set on less than immediate slti rt, rs, immediate compare 16-bit, sign-extended immediate with register rs as signed 32-bit integers. result = 1 if rs is less than immediate; otherwise, result = 0. place result in register rt . set on less than immediate unsigned sltiu rt, rs, immediate compare 16-bit, sign-extended immediate with register rs as unsigned 32-bit integers. result = 1 if rs is less than immediate; otherwise, result = 0. place result in register rt . and immediate andi rt, rs, immediate zero-extend 16-bit immediate , and with contents of register rs , and place result in register rt . or immediate ori rt, rs, immediate zero-extend 16-bit immediate , or with contents of register rs , and place result in register rt . exclusive or immediate xori rt, rs, immediate zero-extend 16-bit immediate , exclusive or with contents of register rs , and place result in register rt . load upper immediate lui rt, immediate shift 16-bit immediate left 16 bits. set least-signi?cant 16 bits of word to zeros. store result in register rt .
computational instructions 3-7 table 3.3 three-operand, register-type instruction summary instruction format and description add add rd, rs, rt add contents of registers rs and rt and place 32-bit result in register rd .trap on twos complement over?ow. add unsigned addu rd, rs, rt add contents of registers rs and rt and place 32-bit result in register rd . do not trap on over?ow. subtract sub rd, rs, rt subtract contents of register rt from rs and place 32-bit result in register rd . trap on twos complement over?ow. subtract unsigned subu rd, rs, rt subtract contents of register rt from rs and place 32-bit result in register rd .do not trap on over?ow. set on less than slt rd, rs, rt compare contents of register rt to register rs (as signed, 32-bit integers). if reg- ister rs is less than rt , rd = 1; otherwise, rd =0. set on less than unsigned sltu rd, rs, rt compare contents of register rt to register rs (as unsigned, 32-bit integers). if register rs is less than rt , rd = 1; otherwise, rd =0. and and rd, rs, rt bitwise and contents of registers rs and rt and place result in register rd . or or rd, rs, rt bitwise or contents of registers rs and rt and place result in register rd . exclusive or xor rd, rs, rt bitwise exclusive or contents of registers rs and rt and place result in register rd . nor nor rd, rs, rt bitwise nor contents of registers rs and rt and place result in register rd .
3-8 instruction set table 3.4 shift instruction summary instruction format and description shift left logical sll rd, rt, shamt shift contents of register rt left by shamt bits, inserting zeros into low-order bits. place 32-bit result in register rd . shift right logical srl rd, rt, shamt shift contents of register rt right by shamt bits, inserting zeros into high-order bits. place 32-bit result in register rd . shift right arithmetic sra, rd, rt, shamt shift contents of register rt right by shamt bits, sign-extending the high-order bits. place 32-bit result in register rd . shift left logical variable sllv rd, rt, rs shift contents of register rt left. low-order 5 bits of register rs specify the number of bits to shift. insert zeros into low-order bits of rt and place 32-bit result in register rd . shift right logical variable srlv rd, rt, rs shift contents of register rt right. low-order 5 bits of register rs specify the number of bits to shift. insert zeros into high-order bits of rt and place 32-bit result in register rd . shift right arithmetic variable srav rd, rt, rs shift contents of register rt right. low-order 5 bits of register rs specify the number of bits to shift. sign-extend the high-order bits of rt and place 32-bit result in register rd .
computational instructions 3-9 table 3.5 multiply/divide instruction summary instruction format and description multiply mult rs, rt multiply contents of registers rs and rt as twos complement values. place 64-bit results in special registers hi and lo . multiply unsigned multu rs, rt multiply contents of registers rs and rt as unsigned values. place 64-bit results in special registers hi and lo. divide div rs, rt divide contents of register rs by the contents of rt as twos complement values. place 32-bit quotient in special register lo and 32-bit remainder in hi. divide unsigned divu rs, rt divide contents of register rs by the contents of rt as unsigned values. place 32-bit quotient in special register lo and 32-bit remainder in hi. move from hi mfhi rd move contents of special register hi to register rd . move from lo mflo rd move contents of special register lo to register rd . move to hi mthi rs move contents of register rs to special register hi. move to lo mtlo rs move contents of register rd to special register lo.
3-10 instruction set table 3.6 computation instruction extensions summary (CW4011 isa) instruction format and description add circular immediate addciu rt, rs, immediate the 16-bit immediate is sign-extended and added to the contents of general reg- ister rs , with the result masked by the value in cp0 register cmask according to the formula: rt = (rs 31...cmask ||(rs+signextended_imed) cmask-1...0 ) . find first set bit ffs rd, rs starting at the most-signi?cant bit in register rs , ?nd the ?rst bit that is set to a one, and return the bit number in register rd . if no bit is set, return with all bits of rd set to 1. find first clear bit ffc rd, rs starting at the most-signi?cant bit in register rs , ?nd the ?rst bit that is set to a zero, and return the bit number in register rd . if no bit is set, return with all bits of rd set to 1. minimum min rd, rs, rt compare the contents of registers rs and rt as twos complement values. the smaller value is stored in register rd . maximum max rd, rs, rt compare the contents of registers rs and rt as twos complement values. the larger value is stored in register rd . select and shift right selsr rd, rs, rt using register rs and rt as a 64-bit register pair and the cp0 register rotate as the shift count, shift the register pair rs || rt right the number of bits speci?ed in rotate, and place the least signi?cant 32-bit value in result register rd . select and shift left selsl rd, rs, rt using register rs and rt as a 64-bit register pair and the cp0 register rotate as the shift count, shift the register pair rs || rt left the number of bits speci?ed in rotate, and place the most signi?cant 32-bit value in result register rd . multiply/add madd rs, rt multiply contents of registers rs and rt as twos complement values. add 64-bit results to contents in special register pair hi/lo, and place results in hi and lo. multiply/add unsigned maddu rs, rt multiply contents of registers rs and rt as unsigned values. add 64-bit results to contents in special register pair hi/lo, and place results in hi and lo. multiply/subtract msub rs, rt multiply contents of registers rs and rt as twos complement values. subtract 64-bit results from contents in special register pair hi/lo, and place results in hi and lo. multiply/subtract unsigned msubu rs, rt multiply contents of registers rs and rt as unsigned values. subtract 64-bit results from contents in special register pair hi/lo, and place results in hi and lo.
jump and branch instructions 3-11 table 3.7 shows the execution time of the multiply/divide/accumulate type instructions. 3.4 jump and branch instructions jump and branch instructions change the control ?ow of a program. mips i jump and branch instructions always occur with a one-instruction delay. the instruction immediately following the jump or branch is always executed while the target instruction is being fetched from storage. there may be additional cycle penalties, depending on circumstances and implementation, but the penalties are interlocked in hardware. the mips ii isa extensions add the branch likely class of instructions that operate exactly like their non-likely counterparts, except that when the branch is not taken, the instruction following the branch is cancelled. the j-type instruction format is used for both jump and jump-and-link instructions for subroutine calls. in the j-type format, the 26-bit target address is shifted left two bits and combined with the 4 high-order bits of the current program counter to form a 32-bit absolute address. the r-type instruction format, which takes a 32-bit byte address contained in a register, is used for returns, dispatches, and cross-page jumps. table 3.7 execution time of multiply and divide instructions operation r3000 cw33300 r4000 CW4011 high speed multiply 12 1 + (bits/3) 10 3 multiply/add na na na 3 1 1. for high-speed CW4011 multiply/add instructions, instructions can be pipe- lined for a throughput of one operation every clock cycle while the latency is three cycles. pipelining the instructions accelerates calculations such as dot products and fir ?lters that perform a series of multiplies/adds to compute a single result. divide 34 34 69 34/17 2 2. the divide time is shortened to 17 cycles if the divisor has less than 16 sig- ni?cant bits.
3-12 instruction set branches have 16-bit signed offsets relative to the program counter (i-type). jump-and-link and branch-and-link instructions save a return address in register 31. table 3.8 summarizes the r-series jump instructions, table 3.9 summarizes the branch instructions, and table 3.10 summarizes the branch likely instructions. table 3.8 jump instruction summary instruction format and description jump j target shift 26-bit target address left two bits, combine with four high-order bits of pc, and jump to address with a one-instruction delay. jump and link jal target shift 26-bit target address left two bits, combine with four high-order bits of pc, and jump to address with a one-instruction delay. place address of instruction fol- lowing delay slot in register 31 (link register). jump register jr rs jump to address contained in register rs with a one-instruction delay. jump and link register jalr rs, rd jump to address contained in register rs with a one-instruction delay. place address of instruction following delay slot in rd .
jump and branch instructions 3-13 table 3.9 branch instruction summary instruction format and description branch on equal beq rs, rt, offset branch to target address 1 if register rs is equal to register rt . branch on not equal bne rs, rt, offset branch to target address if register rs does not equal register rt . branch on less than or equal to zero blez rs, offset branch to target address if register rs is less than or equal to 0. branch on greater than zero bgtz rs, offset branch to target address if register rs is greater than 0. branch on less than zero bltz rs, offset branch to target address if register rs is less than 0. branch on greater than or equal to zero bgez rs, offset branch to target address if register rs is greater than or equal to 0. branch on less than zero and link bltzal rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is less than 0. branch on greater than or equal to zero and link bgezal rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is greater than or equal to 0. 1. all branch-instruction target addresses are computed as follows: add address of instruction in delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). all branches occur with a delay of one instruction.
3-14 instruction set table 3.10 branch likely instruction summary (mips ii isa extensions) instruction format and description branch on equal likely beql rs, rt, offset branch to target address 1 if register rs is equal to register rt . branch on not equal likely bnel rs, rt, offset branch to target address if register rs does not equal register rt . branch on less than or equal to zero likely blezl rs, offset branch to target address if register rs is less than or equal to 0. branch on greater than zero likely bgtzl rs, offset branch to target address if register rs is greater than 0. branch on less than zero likely bltzl rs, offset branch to target address if register rs is less than 0. branch on greater than or equal to zero likely bgezl rs, offset branch to target address if register rs is greater than or equal to 0. branch on less than zero and link likely bltzall rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is less than 0. branch on greater than or equal to zero and link likely bgezall rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is greater than or equal to 0. 1. all branch-instruction target addresses are computed as follows: add address of instruction in delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). all branches occur with a delay of one instruction.
trap instructions 3-15 3.5 trap instructions trap instructions are part of the mips ii instruction set and provide instructions that conditionally create an exception, based on the same conditions tested in the branch instructions. table 3.11 provides a summary of mips ii isa extensions. table 3.11 trap instruction summary (mips ii isa extensions) instruction format and description trap on equal teq rs, rt trap if register rs is equal to register rt . trap on equal immediate teqi rs, immediate trap if register rs is equal to the immediate value. trap on greater than or equal tge rs, rt trap if register rs is greater than or equal to register rt . trap on greater than or equal immediate tgei rs, immediate trap if register rs is greater than or equal to the immediate value. trap on greater than or equal unsigned tgeu rs, rt trap if register rs is greater than or equal to register rt . trap on greater than or equal immediate unsigned tgeiu rs, immediate trap if register rs is greater than or equal to the immediate value. trap on less than tlt rs, rt trap if register rs is less than register rt . trap on less than immediate tlti rs, immediate trap if register rs is less than the immediate value. trap on less than unsigned tltu rs, rt trap if register rs is less than register rt . trap on less than immediate unsigned tltiu rs, immediate trap if register rs is less than the immediate value. trap if not equal tne rs, rt trap if register rs is not equal to rt . trap if not equal immediate tnei rs, immediate trap if register rs is not equal the immediate value.
3-16 instruction set 3.6 special instructions special instructions cause an unconditional branch to the general exception-handling vector. special instructions are always r-type and are summarized in table 3.12 . 3.7 coprocessor instructions the CW4011 supports external (on-chip) coprocessors and implements the coprocessor instruction set. please contact lsi logic if your design needs more than one coprocessor. coprocessor branch instructions are j-type. table 3.13 summarizes the different coprocessor instructions. table 3.12 special instruction summary instruction format and description system call syscall initiates system call trap, immediately transferring control to exception handler. breakpoint break initiates breakpoint trap, immediately transferring control to exception handler.
coprocessor instructions 3-17 table 3.13 coprocessor instruction summary instruction format and description load word to coprocessor z lwcz rt, offset(base) extends the sign of the 16-bit offset and adds the offset to the contents of the general register base to form a 32-bit unsigned effective address. the word at the memory location speci?ed is loaded into coprocessor register rt of the coproces- sor unit z . store word from coprocessor z swcz rt, offset(base) extends the sign of the 16-bit offset and adds the offset to the contents of the general register base to form a 32-bit unsigned effective address. the contents of coprocessor register rt of the coprocessor unit z are stored at the address spec- i?ed by the 32-bit unsigned effective address. move to coprocessor z mtcz rt, rd loads the contents of general register rt into the rd register of coprocessor unit z . move from coprocessor z mfcz rt , rd loads the contents of the rd register of coprocessor unit z into general register rt . move control to coprocessor z ctcz rt, rd loads the contents of general register rt into the control register rd of coprocessor unit z . move control from coproces- sor z cfcz rt, rd loads the contents of the control register rd of coprocessor unit z into general register rt . coprocessor operation copz cofun initiates a coprocessor operation that may specify and reference the coprocessors internal registers or change the state of the coprocessors condition line, but does not change the state within the processor or the cache memory. branch on coprocessor z true (likely) bczt offset, (bcztl offset) compute a branch target address by adding address of instruction to the 16-bit offset (shifted left two bits and sign-extended to 32 bits). branch to the target address (with a delay of one instruction) if coprocessor z s condition line is true. in the case of branch likely, the delay slot instruction is not executed when the branch is not taken. branch on coprocessor z false (likely) bczf offset, (bczfl offset) compute a branch target address by adding address of instruction to the 16-bit offset (shifted left two bits and sign-extended to 32 bits). branch to the target address (with a delay of one instruction) if coprocessor z s condition line is false. in the case of branch likely, the delay slot instruction is not executed when the branch is not taken.
3-18 instruction set 3.8 system control coprocessor (cp0) instructions coprocessor 0 instructions perform operations on the system control coprocessor (cp0) registers to manipulate the memory management and exception-handling facilities of the processor. table 3.14 summarizes the cp0 instructions. if the tlb is removed, the tlb instructions (tlbr, tlbwi, tlbwr, tlbp) cause an ri (reserved instruction) exception. if the CW4011 is in r3000 compatibility mode, the eret (exception returned) instruction is unavailable, and this causes an ri exception. conversely, if the CW4011 is in r4000 mode, the rfe (restore from exception) instruction is unavailable, and this causes an ri exception. table 3.14 cp0 instruction summary instruction format and description move to cp0 mtc0 rt, rd loads contents of cpu register rt into cp0 register rd . move from cp0 mfc0 rt, rd loads contents of cp0 register rd into cpu register rt . read indexed tlb entry 1 tlbr loads entryhi and entrylo with the tlb entry pointed to by the index register. write indexed tlb entry 1 tlbwi loads tlb entry pointed to by the index register with the contents of the entryhi and entrylo registers. write random tlb entry 1 tlbwr loads tlb entry pointed to by the random register with the contents of the entryhi and entrylo registers. probe tlb for matching entry 1 tlbp loads the index register with the address of the tlb entry whose contents match the entryhi and entrylo registers. if no tlb entry matches, set the high-order bit of the index register. (sheet 1 of 2)
system control coprocessor (cp0) instructions 3-19 exception return 2 eret (r4000 mode) loads the pc from errorepc (sr2 = 1: error exception) or epc (sr2 = 0: exception) and clear erl bit (sr2 = 1) or exl bit (sr2 = 0) in the status register. sr2 is status register bit 2. restore from exception 2 rfe (r3000 mode) restores previous interrupt mask and mode bits of the status register into current status bits. restore old status bits into previous status bits. wait for interrupt waiti stops execution of instructions and places the processor into a power save (stall) condition until a hardware interrupt, nmi, or reset is received. 1. if there is no mmu installed, any of these instructions can cause a reserved instruction exception. 2. only one of these instructions is legal at any one time. the one that is not legal causes a reserved instruction exception. table 3.14 cp0 instruction summary (cont.) instruction format and description (sheet 2 of 2)
3-20 instruction set 3.9 cache maintenance instructions cache maintenance instructions are always i-type. table 3.15 summarizes these instructions. table 3.15 cache maintenance instruction summary instruction format and description flush i-cache flushi flush i-cache needs 256 stall cycles. flush d-cache flushd flush d-cache needs 256 stall cycles. flush i-cache and d-cache flushid flush both i-cache and d-cache in 256 stall cycles. writeback wb offset(base) write back a d-cache line addressed by offset+gpr[base] .
CW4011 instruction set extensions 3-21 3.10 CW4011 instruction set extensions this section de?nes the CW4011 instruction set extensions. table 3.16 lists all the extensions and the page where a description can be found. table 3.16 CW4011 instruction set extensions extension page extension page addciu 3-22 max 3-30 ffc 3-23 min 3-31 ffs 3-24 msub 3-32 flushd 3-25 msubu 3-33 flushi 3-26 selsl 3-34 flushid 3-27 selsr 3-35 madd 3-28 waiti 3-36 maddu 3-29 wb 3-37
3-22 instruction set addciu add circular immediate format syntax addciu rt, rs, immediate description the immediate ?eld of the instruction is sign-extended and added to the contents of general register rs , the result of which is masked with the expanded value in special register cmask according to the equation shown below. the cmask register is cp0 register number 24, whose valid bits are [4:0]. the carries resulting from the addition of the sign-extended offset are not propagated into the ?nal result beyond bit [cmask-1]. operation t: sign_extend_immed = (immediate 15 ) 16 || immediate 15..0 gpr[rt] = gpr[rs] 31..cmask || (gpr[rs] + sign_extend_immed) cmask-1..0 exceptions none 31 26 25 21 20 16 15 0 addciu rs rt immediate 011100 rs rt immediate
CW4011 instruction set extensions 3-23 ffc find first clear bit format syntax ffc rd, rs description the contents of general register rs are examined starting with the most- signi?cant bit. the bit number of the ?rst clear bit is returned in general register rd . if no bit is set, all ones are returned in rd . exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs 0 rd 0 ffc 000000 rs 0 rd 00000 001011
3-24 instruction set ffs find first set bit format syntax ffs rd, rs description the contents of general register rs are examined starting with the most- signi?cant bit. the bit number of the ?rst set bit is returned in general register rd . if no bit is set, all ones are returned in rd . exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs 0 rd 0 ffs 000000 rs 0 rd 00000 001010
CW4011 instruction set extensions 3-25 flushd flush data cache format syntax flushd description flushd ?ushes all d-cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushd 0 101111 00000 00010 0
3-26 instruction set flushi flush instruction cache format syntax flushi description flushi ?ushes all i-cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushi 0 101111 00000 00001 0
CW4011 instruction set extensions 3-27 flushid flush instruction and data cache format syntax flushid description flushid ?ushes all d-cache and i-cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushid 0 101111 00000 00011 0
3-28 instruction set madd multiply/add format syntax madd rs, rt description the contents of general register rs and the contents of general register rt are multiplied. both operands are treated as 32-bit twos complement values. when the operation is completed, the doubleword result is added to special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the con?guration and cache control (ccc) register. madd executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.18 on page 3-39 . operation t: t <- (hi || lo) + (gpr[rs] * gpr[rt]) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 madd 000000 rs rt 0 00000 011100
CW4011 instruction set extensions 3-29 maddu multiply/add unsigned format syntax maddu rs, rt description the contents of general register rs and the contents of general register rt are multiplied with both operands treated as 32-bit unsigned values. when the operation is completed, the doubleword result is added to special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.18 on page 3-39 . operation t: t <- (hi || lo) + ((0||gpr[rs]) * (0||gpr[rt])) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 maddu 000000 rs rt 0 00000 011101
3-30 instruction set max maximum format syntax max rd, rs, rt description the source operands rs and rt are compared as twos complement values. the larger value is stored in the rd register. operation t: if gpr[rs]>gpr[rt] then gpr[rd]<-gpr[rs] else gpr[rd]<-gpr[rt] endif exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 max 000000 rs rt rd 00000 101001
CW4011 instruction set extensions 3-31 min minimum format syntax min rd, rs, rt description the source operands rs and rt are compared as twos complement values. the smaller value is stored in the rd register. operation t: if gpr[rs] 3-32 instruction set msub multiply/subtract format syntax msub rs, rt description the contents of general register rs and rt are multiplied and both operands are treated as 32-bit twos complement values. when the operation is complete, the doubleword result is subtracted from special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.18 on page 3-39 . operation t: t <- (hi || lo) - (gpr[rs] * gpr[rt]) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 msub 000000 rs rt 0 00000 011110
CW4011 instruction set extensions 3-33 msubu multiply/subtract unsigned format syntax msubu rs, rt description the contents of general register rs and rt are multiplied and both operands are treated as 32-bit unsigned values. when the operation is completed, the doubleword result is subtracted from special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.18 on page 3-39 . operation t: t <- (hi || lo) - ((0||gpr[rs]) * (0||gpr[rt])) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 msubu 000000 rs rt 0 00000 011111
3-34 instruction set selsl select and shift left format syntax selsl rd, rs, rt description the contents of general register rs and rt are combined to form a 64-bit doubleword. the doubleword is shifted left the number of bits speci?ed in the cp0 register rotate, and the upper 32 bits of the result are placed in general register rd . this rotate register is cp0 register number 23, with valid bits [4:0]. operation t: s <- rotate 4..0 gpr[rd] <- gpr[rs] 31-s..0 || gpr[rt] 31..32-s exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 selsl 000000 rs rt rd 00000 000101
CW4011 instruction set extensions 3-35 selsr select and shift right format syntax selsr rd, rs, rt description the contents of general register rs and rt are combined to form a 64-bit doubleword. the doubleword is shifted right the number of bits speci?ed in cp0 register rotate, and the lower 32 bits of the result are placed in general register rd . this rotate register is cp0 register number 23. valid bits are [4:0]. operation t: s <- rotate 4..0 gpr[rd] <- gpr[rs] s-1..0 || gpr[rt] 31..s exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 selsr 000000 rs rt rd 00000 000001
3-36 instruction set waiti wait for interrupt format syntax waiti description when this instruction is executed, the main processor clock stops and execution of instructions is halted. execution resumes when a hardware interrupt, nmi, or reset exception is received. while it is in wait mode, the processor is in a power saving mode, using very little current because the clock is turned off to most of the circuitry. waiti must be followed by two or more nop instructions, otherwise, the results may be unde?ned. refer to appendix c, programmers notes, for further information. exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 cop0 0 0 0 waiti 010000 10000 00000 00000 00000 100000
CW4011 instruction set extensions 3-37 wb writeback format syntax wb offset( base ) description eight words of the d-cache line addressed by offset +gpr[base] are written back to memory if the line is dirty. upper bits of offset +gpr[base ] are ignored. exceptions none 31 26 25 21 20 16 15 0 cache base wb offset 101111 base 00100 offset
3-38 instruction set 3.11 cpu instruction opcode bit encoding tables 3.17 C 3.23 show the opcode bit encoding for CW4011 instructions. the following key applies to operation codes referenced in the table: *rxf1 cause reserved instruction exceptions in all current implementations and are reserved for future versions of the architecture. *rxf2 cause reserved instruction exceptions in all current implementations and are reserved for future versions of the architecture. *rxf2 is sepa- rated from other reserved instructions for copz. these are not detected as reserved instruction codes that cause an exception on the r3000. the r4000 detects them. *rx40 cause a reserved instruction exception on r4000 and CW4011 proces- sors (when in r4000 mode). they are used as a restore from exception (rfe) instruction on the r3000, lr33000, lr33300, and CW4011 in r3000 mode. *rx64 cause a reserved instruction exception. they are 64-bit instructions on r4000. *nrx invalid but do not cause reserved instruction exceptions in CW4011 implementations. x1 originally, extended instructions in CW4011 implementations. they are reserved instructions that cause an exception on r4000. x2 the operation code cache marked with x2 is valid only for CW4011 pro- cessors with cp0 enabled and causes a reserved instruction exception with cp0 disabled. bits [20:16] are sub-opcodes. they are instructions for cache maintenance, and the functions are not compatible with r4000. recommended mnemonics are flushi, flushd, flushid, and wb offset ( base ). undefined opcodes of cache instruction do not cause reserved instruction exception in CW4011 implementations. x3 originally, extended instructions in CW4011 implementations. they are used for 64-bit multiply and divide instructions on r4000. if the mul bit or mad bit in the ccc register is zero, they cause a reserved instruc- tion exception. the ccc register is described in detail in section 4.3.10, con?guration and cache control (ccc) register, on page 4-22 . x4 cause a reserved instruction exception if the mul bit in the ccc reg- ister is zero. x5 the operation code eret marked with x5 is valid only on the r4000 and CW4011 in r4000 mode. x6 coprocessor 3 instructions, which are not available on r4000. they are available on the r3000 and CW4011.
cpu instruction opcode bit encoding 3-39 table 3.17 CW4011 opcode bit encoding [28:26] opcode [31:29] 0 1 2 3 4 5 6 7 0 special regimm j jal beq bne blez bgtz 1 addi addiu slti sltiu andi ori xori lui 2 cop0 cop1 cop2 cop3 x6 beql bnel blezl bgtzl 3 *rx64 *rx64 *rx64 *rx64 addciu x1 *rxf1 *rxf1 *rxf1 4 lb lh lwl lw lbu lhu lwr *rx64 5 sb sh swl sw *rx64 *rx64 swr cache x2 6 ll lwc1 lwc2 lwc3 x6 *rx64 *rx64 *rx64 *rx64 7 sc swc1 swc2 swc3 x6 *rx64 *rx64 *rx64 *rx64 table 3.18 special opcode bit encoding [2:0] special function [5:3] 0 1 2 3 4 5 6 7 0 sll selsr x1 srl sra sllv selsl x1 srlv srav 1 jr jalr ffs x1 ffc x1 syscall break *rxf1 sync 2 mfhi x4 mthi x4 mflo x4 mtlo x4 *rx64 *rxf1 *rx64 *rx64 3 mult x4 multu x4 div x4 divu x4 madd x3 maddu x3 msub x3 msubu x3 4 add addu sub subu and or xor nor 5 min x1 max x1 slt sltu *rx64 *rx64 *rx64 *rx64 6 tge tgeu tlt tltu teq *rxf1 tne *rxf1 7 *rx64 *rxf1 *rx64 *rx64 *rx64 *rxf1 *rx64 *rx64
3-40 instruction set table 3.19 regimm opcode rt bit encoding [18:16] regimm rt [20:19] 0 1 2 3 4 5 6 7 0 bltz bgez bltzl bgezl *rxf1 *rxf1 *rxf1 *rxf1 1 tgei tgeiu tlti tltiu teqi *rxf1 tnei *rxf1 2 bltzal bgezal bltzall bgezall *rxf1 *rxf1 *rxf1 *rxf1 3 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 table 3.20 cache x2 opcode rt bit encoding [18:16] cache x2 rt [20:19] 0 1 2 3 4 5 6 7 0 *nrx flushi x2 flushd x2 flushid x2 wb x2 *nrx *nrx *nrx 1 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 2 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 3 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx table 3.21 copz rs opcode bit encoding [23:21] copz rs [25:24] 0 1 2 3 4 5 6 7 0 mfcz *rx64 cfcz *rxf2 mtcz *rx64 ctcz *rxf2 1 bc *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 2 copz (coprocessor de?ned instructions) 3
cpu instruction opcode bit encoding 3-41 table 3.22 copz rt opcode bit encoding [18:16] copz rt [20:19] 0 1 2 3 4 5 6 7 0 bcf bct bcfl bctl *rxf2 *rxf2 *rxf2 *rxf2 1 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 3 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 table 3.23 cp0 opcode bit encoding [2:0] cp0 function [5:3] 0 1 2 3 4 5 6 7 0 *nrx tlbr tlbwi *nrx *nrx *nrx tlbwr *nrx 1 tlbp *nrx *nrx *nrx *nrx *nrx *nrx *nrx 2 rfe rx40 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 3 eret x5 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 4 waiti x1 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 5 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 6 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 7 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx
3-42 instruction set
4-1 chapter 4 CW4011 exception processing this chapter describes the CW4011 system coprocessor, coprocessor 0 (cp0), and explains how the CW4011 handles exception processing. the chapter is divided into the following sections: section 4.1, overview section 4.2, r3000 exception compatibility mode section 4.3, exception handling registers section 4.4, exception description details 4.1 overview when the CW4011 detects an exception, it suspends the normal sequence of instruction execution, exits from user mode, and enters kernel mode where it can handle exceptions. the CW4011 reverts to kernel mode, regardless of the mode at the time of the exception. the processor then disables interrupts and forces a software handler located at a ?xed address in memory to be executed. the handler saves the context of the processor. the context must be restored when the exception has been handled. section 5.2.1, operating modes, provides more information on this subject. when an exception occurs, the cp0 loads the exception program counter (epc) with a restart location where execution may resume after the exception has been serviced. the restart location in the epc is the address of the instruction that caused the exception or, if the instruction was executing in a branch delay slot, the address of the branch instruction immediately preceding the delay slot. the instruction causing the exception and all the instructions following in the pipeline are aborted. they will be refetched after return from the exception.
4-2 CW4011 exception processing this chapter describes the events that can initiate exception processing. table 4.1 summarizes these events. table 4.1 CW4011 exceptions exception cause cold reset deassertion of the CW4011 cold reset signal, cresetn. warm reset deassertion of the CW4011 warm reset signal, wresetn. nonmaskable interrupt assertion of the nonmaskable interrupt signal, nmin. debug detection of a program counter breakpoint, data address breakpoint, or trace event. not supported in standard r3000 and r4000 processors. address error either an attempt to load, fetch, or store a word not aligned on a word boundary, or an attempt to load or store a halfword not aligned on a halfword boundary. references to an address for which the most sig- ni?cant bit was set while in the CW4011 was in user mode may also cause an address error. tlb re?ll there is no tlb entry to match a reference to a mapped address space. tlb entry invalid a virtual address reference matches a tlb entry that is marked invalid. tlb modi?ed a store operations virtual address reference matches a tlb entry that is marked valid but is not dirty/writable. bus error assertion of the CW4011 external bus error signal, scberrn. integer over?ow twos complement over?ow during an add or subtract. trap one of the trap instructions results in a true condition. system call an attempt to execute the syscall instruction. breakpoint an attempt to execute the break instruction. reserved instruction execution of an instruction with an unde?ned or reserved major oper- ation code (bits [31:26]), or a special instruction whose minor operation code (bits [5:0]) is unde?ned. coprocessor unusable execution of a coprocessor instruction where the cu (coprocessor usable) bit is not set for the target coprocessor. floating point available for use by an external ?oating-point coprocessor. (sheet 1 of 2)
r3000 exception compatibility mode 4-3 4.2 r3000 exception compatibility mode although the CW4011 processor is based on the mips r4000 architecture, an r3000-style exception processing capability has been added. this facility allows you to con?gure cp0 exception processing in such a way that existing r3000 exception handling code can be run on the CW4011 processor with little or no modi?cation to the code. r3000 compatibility mode is under the control of the compatibility bit (bit 24) of the con?guration and cache control (ccc) register, discussed in section 4.3.10. the compatibility bit is reset to zero (r4000 mode) when a cold reset exception occurs. if r3000 mode operation is desired, bit 24 should be set to one as part of the cold reset handler. once it has been placed in r3000 mode, the processor should only be switched back to r4000 mode by another cold reset. when r3000 mode is enabled, the behavior of the following areas is affected: status register the lower six bits of the status register are rede?ned to implement the kernel/user mode and interrupt enable stack as de?ned by the r3000 architecture. the status register is discussed in detail in section 4.3.6, status register. exception handling vectors the exception handling vectors (base and offset) are remapped to those speci?ed by the r3000 architecture. the exception vectors are discussed in detail in section 4.4.3, exception vector locations. interrupt assertion of one of the CW4011s six hardware interrupt inputs, or the setting of one of the two software interrupt bits in the cause register. interrupts must be enabled. external vectored interrupt assertion of the CW4011 exvintn input. not supported in r3000 and r4000 processors. table 4.1 CW4011 exceptions (cont.) exception cause (sheet 2 of 2)
4-4 CW4011 exception processing exception return (rfe vs. eret) when operating in r3000 compatibility mode, exception return is accomplished using the rfe instruction. if an attempt is made to use the eret instruction, a reserved instruction exception will be recognized. the following sections provide more detail on CW4011 exception handling. where appropriate, the differences between standard operation r4000 and r3000 compatibility mode are noted. in all other cases, operation is identical. 4.3 exception handling registers this section describes the cp0 registers used in exception processing. software examines these registers during exception processing to determine the cause of an exception and the state of the cpu at the time of the exception. each of the registers is listed in table 4.2 and described in detail in the sections that follow. table 4.2 cp0 exception processing registers register name cp0 register number reference page context 4 4-5 debug control and status (dcs) 7 4-7 bad virtual address (badvaddr) 8 4-9 count 9 4-9 compare 11 4-9 status 12 4-10 cause 13 4-18 exception program counter (epc) 14 4-20 processor revision identi?er (prid) 15 4-20 (sheet 1 of 2)
exception handling registers 4-5 two other cp0 registers that are part of the virtual memory management system and contain important information about exception handling are the index register (cp0 register 0), described in section 5.3.2.4, index register, and the random register (cp0 register 1), described in section 5.3.2.5, random register. you can use the mtc0 (move to coprocessor 0) instruction to set the bits in the registers, and mtf0 (move from coprocessor 0) to read the contents of the registers. 4.3.1 context register the context register is a read/write register containing a pointer to an entry in the page table entry (pte) array. this array is an operating system data structure that stores virtual to physical address translations. when there is a tlb miss, operating system software handles the miss by loading the tlb with the missing translation from the pte array. the badvpn ?eld is not writable. it contains the vpn of the most recently translated virtual address that did not have a valid translation (tlbl or tlbs). the ptebase ?eld is both writable and readable, and con?guration and cache control (ccc) 16 4-22 load linked address (lladr) 17 4-26 breakpoint program counter (bpc) 18 4-27 breakpoint data address (bda) 19 4-27 breakpoint pc mask (bpcm) 20 4-27 breakpoint data address mask (bdam) 21 4-28 rotate 23 4-28 circular mask (cmask) 24 4-29 error exception program counter (error epc) 30 4-30 table 4.2 cp0 exception processing registers (cont.) register name cp0 register number reference page (sheet 2 of 2)
4-6 CW4011 exception processing indicates the base address of the pte table of the current user address space. the context register duplicates some of the information provided in the badvaddr register, but the information is in a form that is more useful for a software tlb exception handler. the context register can be used by the operating system to hold a pointer into the pte array. the operating system sets the pte base ?eld register, as needed. normally, the operating system uses the context register to address the current page map, which resides in the kernel- mapped segment kseg2 . the register is included solely for the use of the operating system. figure 4.1 shows the format of the context register. figure 4.1 context register ptebase page table entry base [31:22] this ?eld is the operating system pointer. it points to the pte in memory. badvpn bad virtual page number [21:2] this ?eld contains bits [31:12] of the most recently trans- lated virtual address that did not have a valid translation. this format provides a table of four-byte ptes for a page size of 4 kbytes. for other pte and page sizes, shifting and masking bits [21:2] produces an appropriate address. r reserved [1:0] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. 31 22 21 2 1 0 ptebase badvpn r
exception handling registers 4-7 4.3.2 debug control and status (dcs) register the dcs register contains the enable and status bits for the CW4011 debug facility. all bits have read/write access. figure 4.2 shows the format of the dcs register. figure 4.2 dcs register tr trap 31 this is the trap enable bit. setting it to one traps debug events to the debug exception vector. clearing it to zero disables the trap. however, the status bits (ud, kd, etc.) are updated with status debug event information even when the bit is cleared. ud user mode debug event 30 this bit is set to one to enable detection of a debug event when the CW4011 is operating in user mode. kd kernel mode debug event 29 this bit is set to one to enable detection of a debug event when the CW4011 is operating in kernel mode. te trace event 28 this bit is set to one to enable detection of a trace event (nonsequential fetch operation). dw data write 27 this bit is set to one to enable detection of a data write event as de?ned by the bda and bdam registers. the bit is used in conjunction with dae. dr data read 26 this bit is set to one to enable detection of a data read event as de?ned by the bda and bdam registers. the bit is used in conjunction with dae. dae detect bda event 25 this bit is set to one to enable detection of a bda debug event. 31302928272625242322 6543210 tr ud kd te dw dr dae pce de r t w rd da pc db
4-8 CW4011 exception processing pce program counter breakpoint event 24 this bit is set to one to enable detection of a program counter breakpoint event as de?ned by the bpc and bpcm registers. de debug enable 23 this bit is set to one to enable the debug facility. clearing the bit disables the debug facility. r reserved [22:6] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future software revisions. t trace status 5 the core sets t to one when it detects a trace condition. w write status 4 the core sets w to one when it detects a write reference to the address speci?ed in the breakpoint address register. rd read status 3 the core sets rd to one when it detects a read reference to the address speci?ed in the breakpoint data address register. da dae debug condition status 2 the core sets da to one when it detects a data address debug condition. pc pce debug condition status 1 the core sets pc to one when it detects a program counter debug condition. db debug detected status 0 the core sets db to one when it detects any debug condition.
exception handling registers 4-9 4.3.3 bad virtual address (badvaddr) register the badvaddr register is a read-only register that holds the 32-bit failing virtual address for address error (adel, ades) and tlb translation (tlbl, tlbs, mod) exceptions. figure 4.3 shows the format of the badvaddr register. figure 4.3 badvaddr register 4.3.4 count register the count register acts as a timer. it increments at a constant rate regardless of whether an instruction is executed, retried, or any forward progress is made. the count register increments at half the maximum instruction issue rate. the count register is a read/write registerit can be written for diagnostic purposes or for system initialization to synchronize two processors operating in lock step. figure 4.4 shows the format of the count register. figure 4.4 count register 4.3.5 compare register the compare register implements a timer service (see also the count register) that maintains a stable value and is not automatically updated by core events. when the timer facility is enabled and the value of the count register equals the value of the compare register, interrupt bit ip[7] in the cause register is set. this causes an interrupt on the next execution cycle when the interrupt is enabled. writing a value to the compare register clears the timer interrupt. 31 0 bad virtual address md96.141 31 0 count md96.142
4-10 CW4011 exception processing for diagnostic purposes, the compare register is a read/write register. in normal operation, the compare register is only written. figure 4.5 shows the format of the compare register. figure 4.5 compare register 4.3.6 status register the status register is a read/write register that contains the operating mode, interrupt enabling, and the diagnostic states of the processor. the format of the status register is slightly different when the CW4011 is operating in r4000 mode from when it is in r3000 mode. section 4.3.6.1, r4000 mode operation, describes the format for r4000 mode operation and section 4.3.6.2, r3000 mode operation, describes the format for r3000 mode operation. 4.3.6.1 r4000 mode operation the format of the r4000 version of the status register (ccc24 = 0) is shown in figure 4.6 . following the ?gure are bit-?eld descriptions and information on these r4000 operations: interrupt enable processor modes kernel address space accesses user address space accesses cold reset warm reset 31 0 compare
exception handling registers 4-11 figure 4.6 status register (r4000 mode) cu[3:0] coprocessor usability bits [31:28] the software uses this ?eld to control accesses to the coprocessors. when the bit is set to one the correspond- ing coprocessor is usable: please note that cp0 is always available in kernel mode regardless of the cu[0] setting. r reserved [27:23, 21, 19:16, 7:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. bev bootstrap exception vector 22 this bit controls the location of the tlb re?ll and the general exception vectors. setting the bit to one indicates a bootstrap operation and bootstrap vector locations are used. when the bit is cleared to zero, normal exception vectors are used. sr soft reset 20 when a warm reset or a nonmaskable interrupt occurs, the core sets sr to one. int[5:0] interrupt mask [15:10] this ?eld is a six-bit [5:0] hardware interrupt mask. setting a bit to one enables the corresponding hardware interrupt. for example, setting bit 5 enables hardware interrupt 5. 31 28 27 23 22 21 20 19 16 15 10 9 8 7 5 4 3 2 1 0 cu[3:0] r bev r sr r int[5:0] sw[1:0] r ksu[1:0] erl exl ie cu[3:0] coprocessor 33 22 11 00
4-12 CW4011 exception processing sw[1:0] software interrupt mask [9:8] this ?eld is a two-bit [1:0] software interrupt mask. setting a bit to one enables the corresponding software interrupt. ksu[1:0] kernel/user mode [4:3] this ?eld determines the base operating mode of the CW4011 core as follows: all other settings are reserved. erl error level 2 this bit determines the error level of the CW4011. when it is set to one, the level is error. when it is cleared to zero, the level is normal. exl exception level 1 this bit determines the exception level of the CW4011. when it is set to one, the level is exception. when it is cleared to zero, the level is normal. ie interrupt enable 0 setting this bit to one enables interrupts. clearing it to zero disables interrupts. interrupt enable C interrupts are enabled when the following ?eld conditions are true: ie is set to one exl is cleared to zero erl is cleared to zero if these conditions are met, interrupts are recognized according to the setting of the int and sw mask bits. processor modes C the setting of the ksu bits, in conjunction with the settings of the exl and erl bits, de?nes the CW4011 processor modes as follows: bit base mode 00 kernel 10 user
exception handling registers 4-13 the processor is in user mode when ksu is equal to 0b10, and exl and erl are cleared to zero. the processor is in kernel mode under any one of the following conditions: C ksu is equal to 0b00 C exl is set to one C erl is set to one kernel address space accesses C access to the kernel address space is allowed only when the processor is in kernel mode. user address space accesses C access to the user address space is always allowed. cold reset C the contents of the status register are unde?ned after a cold reset, except for these bits: erl and bev bits are set to one cu[3:0] and sr bits are set to zero warm reset C the contents of the status register are unchanged by warm reset, except for these bits: erl, bev, and sr bits are set to one. 4.3.6.2 r3000 mode operation the format of the r3000 version of the status register (ccc24 = 1) is shown in figure 4.7 . following the ?gure are bit-?eld descriptions and descriptions of these r3000 operations: interrupt enable processor modes kernel address space accesses user address space accesses cold reset warm reset mode bits and exception processing
4-14 CW4011 exception processing figure 4.7 status register (r3000 mode) cu[3:0] coprocessor usability bits [31:28] the software uses this ?eld to control accesses to the coprocessors. when the bit is set to one the correspond- ing coprocessor is usable, as shown below: please note that cp0 is always available in kernel mode regardless of the cu[0] setting. r reserved [27:23, 21, 19:16, 7:6] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future versions of the software. bev bootstrap exception vector 22 this bit controls the location of the tlb re?ll and the general exception vectors. setting the bit to one implements a bootstrap operation and bootstrap vector locations are used. when the bit is cleared to zero, normal exception vectors are used. sr soft reset 20 when either a warm reset or a nonmaskable interrupt occurs, the core sets sr to one. int[5:0] interrupt mask [15:10] this ?eld is a six-bit [5:0] hardware interrupt mask. setting a bit to one enables the corresponding hardware interrupt. for example, setting bit 5 to one enables hard- ware interrupt 5. 31 2827 2322212019 1615 109876543210 cu[3:0] r bev r sr r int[5:0] sw[1:0] r kuo ieo kup iep kuc iec cu[3:0] coprocessor 33 22 11 00
exception handling registers 4-15 sw[1:0] software interrupt mask [9:8] this ?eld is a two-bit [1:0] software interrupt mask. set- ting a bit to one enables the corresponding software interrupt. kuo kernel/user mode, old 5 this bit shows the old base operating mode of the CW4011 core. setting it to one indicates user mode. clearing the bit to zero indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes. ieo interrupt enable, old 4 this bit shows the old interrupt enable setting. setting it to one indicates that interrupts are enabled. clearing the bit to zero indicates that interrupts are disabled. the bit is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. kup kernel/user mode, previous 3 this bit shows the previous base operating mode of the CW4011 core. setting it to one indicates user mode. clearing the bit to zero indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes. iep interrupt enable, previous 2 this bit shows the previous interrupt enable setting. setting it to one indicates that interrupts are enabled. clearing the bit to zero indicates that interrupts are disabled. the bit is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. kuc kernel/user mode, current 1 this bit shows the current base operating mode of the CW4011 core. setting it to one indicates user mode. clearing the bit to zero indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes. iec interrupt enable, current 0 this bit shows the old interrupt enable setting. setting it to one indicates that interrupts are enabled. clearing the bit to zero indicates that interrupts are disabled. the bit
4-16 CW4011 exception processing is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. interrupt enable C interrupts are enabled when iec is set to one. in this case, interrupts are recognized according to the setting of the int and sw masks. processor modes C CW4011 processor modes are de?ned by the setting of the kuc bit: the processor is in user mode when kuc is set to one. the processor is in kernel mode when kuc is cleared to zero. kernel address space accesses C access to the kernel address space is allowed only when the processor is in kernel mode. user address space accesses C access to the user address space is always allowed. cold reset C the CW4011 processor enters r4000 mode upon cold reset. refer to cold reset on page 4-13 for the initial status register settings for this mode. to enter r3000 mode, set bit 24 of the con?guration and cache control (ccc) register to one as part of the cold reset handler. upon entering r3000 mode after a cold reset, the contents of the status register are unde?ned except for the following bits: the bev bit is set to one. the cu[3:0], kuc, kuo, kup, iec , ieo, iep, and sr bits are cleared to zero. warm reset C the contents of the status register are unchanged by warm reset, except for the following bits: the bev and sr bits are set to one. the ku and ie bits are pushed deeper into the stack and kuc and iec are cleared to zero, for example: kuo/ieo ? kup/iep ? kuc/iec ? 0/0.
exception handling registers 4-17 mode bits and exception processing C figure 4.8 shows how the CW4011 core manipulates the status register during exception recognition. figure 4.8 status register and exception recognition when the CW4011 recognizes an exception, it saves the current kernel/user mode bit (kuc) and the current interrupt enable bit (iec) in the previous kernel/user mode bit (kup) and previous interrupt enable bit (iep), respectively. the previous bits are saved in the old bits, and the current bits are cleared to zero. the process is shown in the following example: kuo/ieo ? kup/iep ? kuc/iec ? 0/0. when the CW4011 executes a return from exception (rfe) instruction, the values are popped off the stack, kuc and iec are reset to their previous values, for example: kuc/iec ? kup/iep ? kuo/ieo. 543210 kuo ieo kup iep kuc iec kuo ieo kup iep kuc iec 00 exception recognition kuc current kernel/user mode bit iec current interrupt enable mode bit kup previous kernel/user mode bit iep previous interrupt enable mode bit kuo old kernel mode bit ieo old interrupt enable mode bit
4-18 CW4011 exception processing 4.3.7 cause register the cause register is a read/write register. the contents of this register provide information about the most recent exception. the format of the cause register is shown in figure 4.9 . all bits in the register, with the exception of ip[1:0], are read-only bits. figure 4.9 cause register bd branch delay 31 when set, this bit indicates that the last exception was taken while the CW4011 was executing an instruction in a branch delay slot. bt branch taken 30 if the bd bit is set, this bit indicates if the branch was taken. ce[1:0] coprocessor error [29:28] the value in the coprocessor error ?eld indicates the coprocessor unit referenced when a coprocessor unusable exception is taken: r reserved [27:16, 7, 1:0] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. 31 30 29 28 27 16 15 8 7 6 2 1 0 bd bt ce[1:0] r ip[7:0] r exccode[4:0] r bt bit branch condition 0 branch not taken 1 branch taken ce1 ce0 coprocessor referenced 1 1 coprocessor 3 1 0 coprocessor 2 0 1 coprocessor 1 0 0 coprocessor 0
exception handling registers 4-19 ip[7:0] interrupt pending [15:8] this bit ?eld indicates which interrupts are pending. bits ip[7:2] correspond to the six external hardware interrupts and bits ip[1:0] correspond to the two software interrupts. the software interrupts can be set and cleared directly by writing to ip[1:0]. exccode[4:0] exception code [6:2] this ?eld de?nes the exception code. table 4.3 lists the valid exception code values. table 4.3 cause register exccode field exception code value mnemonic description 0 int interrupt 1 mod tlb modi?cation exception 2 tlbl tlb exception (load or instruction fetch) 3 tlbs tlb exception (store) 4 adel address error exception (load or instruction fetch) 5 ades address error exception (store) 6 bus bus error exception 7 reserved 8 sys syscall exception 9 bp breakpoint exception 10 ri reserved instruction exception 11 cpu coprocessor unusable exception 12 ov arithmetic over?ow exception 13 tr trap exception 14 reserved 15 fpe floating-point exception 16C31 reserved
4-20 CW4011 exception processing 4.3.8 exception program counter (epc) register the read-write epc register contains the address at which processing resumes after an exception has been serviced. for synchronous exceptions, the epc register contains either: the virtual address of the instruction that was the direct cause of the exception the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the branch delay bit in the cause register is set) figure 4.10 shows the format of the epc register. bits [31:2] make up the program counter. bits [1:0] are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. figure 4.10 epc register 4.3.9 processor revision identi?er (prid) register the 32-bit, read-only prid register contains information identifying the implementation and revision level of the CW4011 core, as shown in figure 4.11 . figure 4.11 prid register r reserved [31:16] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. 31 210 exception program counter r 31 16 15 12 11 8 7 4 3 0 r limp uimp lrev urev
exception handling registers 4-21 limp lsi logic implementation number [15:12] this value represents the implementation number of the CW4011; it is currently set to 0x4. uimp user implementation number [11:8] the value in this ?eld represents the users implementa- tion number. this ?eld can be programmed at the core interface using the implop[3:0] lines. lrev lsi logic revision number [7:4] this value is the revision number of the CW4011, which is set to 0x1 for the original version. urev user revision number [3:0] the value of this ?eld is interpreted as a processor unit revision number. this ?eld can be programmed at the core interface using the revlop[3:0] lines. the revision number can distinguish between some chip revisions. however, lsi logic does not guarantee that changes to the core will necessarily be re?ected in the prid register, or that changes to the revision number necessarily re?ect real core changes. for this reason, these values are not listed and software should not rely on the revision number in the prid register to character- ize the core.
4-22 CW4011 exception processing 4.3.10 con?guration and cache control (ccc) register the ccc register allows software to con?gure various pieces of the CW4011 design (for example, biu, tlb, and cache controllers). figure 4.12 shows the format of the ccc register. figure 4.12 ccc register r reserved [31:29, 27] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. ewp external write priority 28 this bit de?nes scbus arbitration priority between data reads and writes in the 4-level write buffer. clearing ewp to zero gives higher priority to data read requests, if the read address does not match any of the write addresses in the write buffer. setting ewp to one gives higher priority to data writes. isr1 i-cache scratchpad ram 26 setting this bit to one enables i-cache set 1 to be used as a scratchpad ram. clearing isr1 to zero disables the i-cache set 1 scratchpad ram mode. evi external vectored interrupt 25 this bit enables and disables external vectored interrupts. setting the bit to one enables the interrupt and clearing it to zero disables the interrupt. cmp r3000 compatibility 24 this bit enables and disables the r3000 exception processing and status register compatibility mode. set- ting the bit to one enables the mode and clearing it disables the mode. 31 29 28 27 26 25 24 23 22 21 20 19 18 17 16 r ewp r isr1 evi cmp iie die mul mad tmr beg ie0 ie1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 is[1:0] de0 de1 ds[1:0] ipwe ipws[1:0] te wb sr0 sr1 isc tag inv
exception handling registers 4-23 iie i-cache invalidate enable 23 this bit enables and disables i-cache invalidation. setting iie to one enables the interface and clearing it to zero disables the interface. die d-cache invalidate enable 22 this bit enables and disables the d-cache invalidate interface. setting the bit to one enables the request and clearing it to zero disables the interface. mul multiplier enable 21 this bit enables and disables the hardware multiplier. setting mul to one enables the multiplier and clearing it disables the multiplier. mad multiplier accumulate extensions 20 this bit allows the multiplier to support accumulate exten- sions. setting the bit to one enables the feature and clearing the bit disables the feature. when this bit is set, mul must also be set. tmr timer 19 this bit is the timer facility enable. when set to one, external hardware interrupt 5 is disabled. in the place of interrupt 5, the core enables the cp0 count/compare timer facility. this new timer facility replaces interrupt 5 in the cause register ip[7] bit. beg biu bus enable grant 18 this bit enables and disables the biu bus grant. setting this bit to one enables the external bus master. clearing it to zero causes the CW4011 core to ignore the external bus master. ie0 i-cache set 0 enable 17 this bit enables and disables set 0 of the i-cache. setting the bit to one enables set 0 and clearing it to zero disables set 0. ie1 i-cache set 1 enable 16 this bit enables and disables set 1 of the i-cache. setting the bit to one enables set 1 and clearing it to zero disables set 1.
4-24 CW4011 exception processing is[1:0] i-cache size [15:14] the is[1:0] ?eld determines the size of each i-cache set. the ?eld is set as follows: de0 d-cache set 0 enable 13 this bit enables and disables set 0 of the d-cache. setting the bit to one enables set 0 and clearing it to zero disables set 0. de1 d-cache set 1 enable 12 this bit enables and disables set 1 of the d-cache. setting the bit to one enables set 1 and clearing it to zero disables set 1. ds[1:0] d-cache size [11:10] the ds[1:0] ?eld determines the size of each d-cache set. the ?eld is set as follows: ipwe in-page write enable 9 this bit enables and disables in-page write operations. setting the bit to one enables in-page write and clearing it to zero disables in-page write. is[1] is[0] cache size (kbytes) 00 1 01 2 10 4 11 8 ds[1] ds[0] cache size (kbytes) 001 012 104 118
exception handling registers 4-25 ipws[1:0] in-page write size [8:7] the ipws[1:0] ?eld determines the external dram page size for in-page write operations. the ?eld is set as follows: te tlb enable 6 this bit enables and disables the tlb. setting the bit to one enables the tlb, if one is present, and clearing the bit to zero disables the tlb. wb writeback 5 this bit de?nes the caching algorithm used for kseg0 . additionally, when the tlb is absent or disabled, it also de?nes the caching algorithm for kuseg and kseg2 . set- ting wb to one enables writeback operation and clearing wb to zero enables writethrough operation. sr0 scratchpad ram mode set 0 4 this bit enables and disables scratchpad ram mode for set 0 of the d-cache. setting the bit to one enables scratchpad mode and clearing it to zero disables scratch- pad mode. sr1 scratchpad ram mode set 1 3 this bit enables and disables scratchpad ram mode for set 1 of the d-cache. setting the bit to one enables scratchpad mode and clearing it to zero disables scratch- pad mode. isc isolate cache 2 this bit enables isolate cache mode. this means that stores to the cache are not propagated to external mem- ory. setting the bit to one enables the mode and clearing it to zero disables the mode. tag tag test mode 1 this bit enables and disables tag test mode, which is used for cache maintenance. setting the bit to one ipws[1] ipws0 in-page write size (kbytes) 001 012 104 118
4-26 CW4011 exception processing enables the mode and clearing it to zero disables the mode. inv invalidate cache mode 0 this bit enables and disables cache invalidate mode, which is used for cache maintenance. setting the bit to one enables the mode and clearing it to zero disables the mode. 4.3.11 load linked address (lladdr) register the lladdr register is a read/write register that contains the physical address (paddr[31:2]) read by the most recent load linked instruction. this register is used for diagnostic purposes only and serves no function during normal operation. figure 4.13 shows the format of the lladdr register. bits [31:2] contain the physical address (paddr). bits [1:0] are reserved and cleared to zero. figure 4.13 lladdr register 31 210 paddr[31:2] r
exception handling registers 4-27 4.3.12 breakpoint program counter (bpc) register software uses the read/write bpc register to specify a program counter breakpoint. the bpc register is used in conjunction with the breakpoint pc mask register, described in section 4.3.14 . figure 4.14 shows the format of the 32-bit bpc register. bits [1:0] are reserved and cleared to zero. figure 4.14 bpc register 4.3.13 breakpoint data address (bda) register software uses the read/write bda register to specify a virtual data address breakpoint. the bda register is used in conjunction with the breakpoint data address mask register described in section 4.3.15 . figure 4.15 shows the format of the 32-bit bda register. figure 4.15 bda register 4.3.14 breakpoint pc mask (bpcm) register the read/write bpcm register masks bits in the bpc register. a one in any bit in the bpcm register indicates that the CW4011 compares the corresponding pc bit to that contained in the bpc register for program counter exceptions. zero values in the mask indicate that the CW4011 does not check the corresponding pc bits to the bpc register bits. figure 4.16 shows the format of the 32-bit bpcm register. bits [1:0] are reserved and cleared to zero. 31 210 breakpoint program counter r 31 0 breakpoint data address
4-28 CW4011 exception processing figure 4.16 bpcm register 4.3.15 breakpoint data address mask (bdam) register the read/write bdam register masks bits in the bda register. a one in any bit in the bdam register indicates that the CW4011 compares the corresponding virtual data address bit to that contained in the bda register for data address (debug) exceptions. values of zero in the mask indicate that the CW4011 does not check the corresponding virtual data address bits to the bda register bits. figure 4.17 shows the format of the 32-bit bdam register. figure 4.17 bdam register 4.3.16 rotate register select and rotate left (selsl) and select and rotate right (selsr) use the lower ?ve bits of the rotate register [4:0] as the shift count. this is useful for data alignment operations in graphics and in bit-?eld selection routines for data transmission and compression applications. even though the rotate register resides in the cp0, user-mode access to the register is always granted, regardless of the value contained in the cu0 bit of the status register. figure 4.18 shows the format of the rotate register. 31 210 breakpoint program counter mask r 31 0 breakpoint data address mask
exception handling registers 4-29 figure 4.18 rotate register r reserved [31:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. rotate rotate [4:0] this ?eld determines the shift count. 4.3.17 circular mask (cmask) register the cmask register is used by the CW4011 instruction set extensions. the load/store word/halfword/byte with update circular instructions store a value in the destination register and update the base address register with the addition of base + offset, which is modi?ed according to the value of bits [4:0]. this feature is important in dsp (digital signal processing) and other applications that use circular buffers. even though the cmask register resides within the cp0, user-mode access to the register is always granted, regardless of the value contained in status[cu0]. figure 4.19 shows the format of the cmask register. figure 4.19 cmask register r reserved [31:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future versions of the software. cmask circular mask [4:0] this ?eld contains the circular mask. 31 54 0 r rotate 31 54 0 r cmask
4-30 CW4011 exception processing 4.3.18 error exception program counter (error epc) register the error epc register is similar to the epc register. it stores the pc (program counter) on cold reset, warm reset, and nmi exceptions. the read/write error epc register contains the virtual address at which instruction processing can resume after the exception has been serviced. the address may be either: the virtual address of the ?rst instruction terminated by the exception the virtual address of the immediately preceding branch or jump instruction when the terminated instruction is in a branch delay slot there is no branch delay slot indication for the error epc register. figure 4.20 shows the format of the error epc register. bits [31:2] make up the error epc. bits [1:0] are reserved and cleared to zero. figure 4.20 error epc register 4.4 exception description details this section describes each of the CW4011 core exceptions, what causes these exceptions, and how they are handled and serviced. this section is further divided as follows: section 4.4.1, exception operation section 4.4.2, precision of exceptions section 4.4.3, exception vector locations section 4.4.4, priority of exceptions section 4.4.5, reset exceptions section 4.4.6, interrupt exceptions section 4.4.7, address error exception section 4.4.8, tlb exceptions section 4.4.9, bus error exception 31 210 error epc r
exception description details 4-31 section 4.4.10, integer over?ow exception section 4.4.11, trap exception section 4.4.12, system call exception section 4.4.13, breakpoint exception section 4.4.14, reserved instruction exception section 4.4.15, floating-point exception section 4.4.16, coprocessor unusable exception section 4.4.17, debug exception 4.4.1 exception operation to handle an exception, the processor saves the current operating state, enters kernel mode, disables interrupts, and forces execution of a handler at a ?xed address. to resume normal operation, the operating state must be restored and interrupts enabled. when an exception occurs, the epc register is loaded with the restart location at which execution can resume after the exception has been serviced. the epc register contains the address of the instruction associated with the exception, or, if the instruction was executing in a branch delay slot, the epc register contains the address of the branch instruction immediately preceding. 4.4.1.1 r4000 mode operation (default after cold reset) the CW4011 processor uses the following mechanisms for saving and restoring the operating mode and interrupt status: a single interrupt enable bit (ie) located in the status register a base operating mode (user, kernel) located in the ksu ?eld of the status register an exception level (normal, exception) located in the exl ?eld of the status register an error level (normal, error) located in the erl ?eld of the status register interrupts are enabled by setting the ie bit to one and both levels (exl, erl) to normal.
4-32 CW4011 exception processing table 4.4 shows how the current processor operating mode is de?ned. exceptions set the exception level to exception (exl = 1). the exception handler typically resets the exception level to normal (exl = 0) after saving the appropriate state. it sets it back to exception while restoring that state. returning from an exception (eret instruction) resets the exception level to normal. 4.4.1.2 r3000 mode operation r3000 mode of operation is much simpler than the r4000 mode. the current processor operating state is always de?ned by the kuc bit (0 ? kernel, 1 ? user). the basic mechanism for saving and restoring the operating state of the processor is the kernel/user (ku) and interrupt enable (ie) stack located in the bottom six bits of the status register. when responding to an exception, the current mode bits (kuc/iec) are saved into the previous mode bits (kup/iep); the previous mode bits are saved into the old mode bits (kuo/ieo); and the current mode bits (kuc/iec) are both cleared to zero. after exception processing has been completed, the saved state is restored using the rfe instruction, which causes the previous mode bits to be copied back into the current mode bits and the old mode bits to be copied back into the previous mode bits. the old mode bits are left unchanged. 4.4.1.3 exception processing diagrams figures 4.21 C 4.25 show the basic set of actions taken for each of the major CW4011 exception classes: cold reset, warm reset, nonmaskable interrupt (nmi), common, debug, and external vectored interrupt. table 4.4 current processor mode current mode status ksu[1:0] status exl status erl user kernel kernel kernel 10 00 xx xx 0 0 1 0 0 0 0 1
exception description details 4-33 figure 4.21 cold reset exception figure 4.22 warm reset, nmi exceptions figure 4.23 common exceptions random ? tlbentries - 1 wired ? 0 ccc ? 0 32 dcs ? 0 32 errorpc ? pc sr ? 0 4 || sr[27:23] || 1 || 0 || 0 || sr[19:3] || 1 || sr[1:0] pc ? 0xbfc0 0000 errorpc ? pc if (ccc24 = 0) then sr ? sr[31:23] || 1 || 0 || 1 || sr[19:3] || 1 || sr[1:0] else sr ? sr[31:23] || 1 || 0 || 1 || sr[19:6] || sr[3:0] || 0 2 endif pc ? 0xbfc0 0000 cause ? bd || bt || ce || 0 12 || cause[15:8] || 0 || exccode || 0 2 if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr[31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 0 2 endif if (sr22 = 1) then if (ccc24 = 0) then pc ? 0xbfc0 0200 + vector offset else pc ? 0xbfc0 0100 + vector offset endif else pc ? 0x8000 0000 + vector offset endif
4-34 CW4011 exception processing figure 4.24 debug exception figure 4.25 external vectored interrupt exception 4.4.2 precision of exceptions exceptions are logically precise. this means that the instruction that causes an exception and all those that follow it are aborted, generally before committing to any state; execution picks up where it left off before the exception; and the instruction can be re-executed after the exception has been serviced. when following instructions are killed, exceptions associated with those instructions are also killed, so that exceptions are not taken in the order detected, but in the instruction fetch order. interrupts generated by external devices attached to the processor have a variety of meanings, depending on the system environment into which the CW4011 core is designed. variations in memory system design can affect the meaning of bus error exceptions and the location and means of accessing relevant parameters to service them. as far as possible, this dcs ? dcs[31:6] || t || w || r || da || pc || db cause ? bd || bt || cause[29:0] if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr[31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 0 2 endif if (sr22 = 1) then if (ccc24 = 0) then pc ? 0xbfc0 0200 + vector offset else pc ? 0xbfc0 0100 + vector offset endif else pc ? 0x8000 0000 + vector offset endif cause ? bd || bt || cause[29:0] if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr[31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 0 2 endif pc ? exvap[31:2] || 0 2
exception description details 4-35 architectural description of the exception handling system de?nes which state information is reliable and which is unreliable. in some cases, however, the characteristics of the pipeline staging cannot guarantee that all states in the processor and associated system will remain completely unchanged as a result of the (possibly incomplete) execution of instructions immediately following an instruction that has caused an exception. state changes that may occur include the following: instructions may be read from memory and loaded into the i-cache. the multiply/divide registers (hi and lo) may have been altered by a mult/multu, div/divu, or mthi/mtlo instruction. these changes can normally be ignored because the state of the machine is suf?ciently restored, allowing execution to resume after the exception has been serviced. 4.4.3 exception vector locations the cold reset, warm reset, and nmi exceptions are always vectored to location 0xbfc00000. addresses for other exceptions are a combination of a vector offset and a base address, and they are determined by the bev bit of the status register. table 4.5 shows the vector base addresses and table 4.6 shows the vector offsets. table 4.5 exception vector base addresses bev r4000 mode (ccc24 = 0) r3000 mode (ccc24 = 1) 0 0x80000000 0x80000000 1 0xbfc00200 0xbfc0100 table 4.6 exception vector offset addresses exception r4000 mode (ccc24 = 0) r3000 mode (ccc24 = 1) tlb re?ll 0x000 (exl = 0) 0x000 ( kuseg access) debug 0x040 0x040 all others 0x180 0x080
4-36 CW4011 exception processing 4.4.4 priority of exceptions while more than one exception can occur for a single instruction, only one exception is reported. table 4.7 shows the priority order given to the exception, with cold reset having the highest priority. 4.4.5 reset exceptions this subsection describes the cold and warm reset exceptions. table 4.7 exception priority order priority exception highest lowest cold reset warm reset nonmaskable interrupt address error (instruction fetch) tlb re?ll (instruction fetch) tlb invalid (instruction fetch) bus error integer over?ow, trap, system call, breakpoint, reserved instruction, coprocessor unusable, floating-point error address error (data access) tlb re?ll (data access) tlb invalid (data access) tlb modi?ed (data write) interrupt external vectored interrupt debug
exception description details 4-37 4.4.5.1 cold reset exception the primary purpose of a cold reset is to initialize the CW4011 core at power-up. this section describes the cause of and response to a cold reset exception. cause C the cold reset exception occurs when the cresetn signal is asserted and then deasserted. this exception is not maskable. handling C the cpu provides a special interrupt vector (0xbfc00000) for the cold reset exception. the reset vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache to handle the exception. the processor can fetch and execute instructions while the caches and virtual memory are in an unde?ned state. the contents of all registers in the cpu are unde?ned when the cold reset exception occurs, except for the following: in the status register, the cu[3:0] and sr bits are cleared to zero and the erl and bev bits are set to one. other bits are unde?ned. the random register is initialized to the value of its upper boundary. the wired register is initialized to zero. servicing C the cold reset exception is serviced by initializing all processor registers, coprocessor registers, caches, and the memory system. servicing is accomplished by performing diagnostic tests, and by bootstrapping the operating system. 4.4.5.2 warm reset exception the primary purpose of the warm reset exception is to reinitialize the processor after a fatal error. unlike nonmaskable interrupts, all cache and bus state machines are reset by this exception. like cold reset, it can be used on the processor in any state. the caches, tlb, and normal exception vectors need not be properly initialized. this section describes the cause of and response to a warm reset exception. cause C the warm reset exception occurs when the wresetn signal is asserted and then deasserted. this exception is not maskable.
4-38 CW4011 exception processing handling C the reset exception vector (0xbfc00000) is used for this exception. the vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache to handle the exception. the sr bit of the status register is set to distinguish between a warm reset exception and a cold reset exception. the contents of all registers are preserved when the warm reset exception occurs, except for the following: the errorpc register, which contains the restart pc (program counter) the bev and sr bits of the status register, which are set to one r4000 mode, in which the erl bit is set to one r3000 mode, in which kuo/ieo ? kup/iep ? kuc/iec ? 0/0 because warm reset can abort cache and bus operations, cache and memory state is unde?ned when the warm reset exception occurs. servicing C the warm reset exception is serviced by saving the current processor state for diagnostic purposes, and reinitializing in a manner similar to that for the cold reset exception. 4.4.6 interrupt exceptions this section describes exceptions caused by nonmaskable interrupts, normal interrupts, and external vectored interrupts. 4.4.6.1 nonmaskable interrupt (nmi) exception nonmaskable interrupts cannot be disabled. they occur when a catastrophic event, such as power failure, requires immediate attention to maintain system integrity. cause C the nonmaskable interrupt exception occurs in response to the falling edge of the nmi signal. as the name implies, the nmi exception is not maskable, and occurs regardless of the settings of the exl, erl, and ie status register bits. handling C the reset exception vector (0xbfc00000) is also used for this exception. the reset vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache
exception description details 4-39 to handle the nmi interrupt. the sr bit of the status register is set to differentiate the nmi exception from a cold reset exception. because an nmi could occur in the middle of another exception, it is generally not possible to continue program execution after servicing an nmi. unlike cold and warm reset, but in common with other exceptions, nmi is taken only at instruction boundaries. the states of the caches and memory system are preserved by this exception. the contents of all registers in the cpu are preserved when this exception occurs, except for the following: the errorpc register, which contains the restart pc the bev and sr bits of the status register, which are set to one r4000 mode, in which the erl bit is set to one r3000 mode, in which kuo/ieo ? kup/iep ? kuc/iec ? 0/0 servicing C the nmi exception is serviced by saving the current processor state for diagnostic purposes and reinitializing the system in a manner similar to that for the cold reset exception. 4.4.6.2 interrupt exception this section describes the cause of and response to an interrupt exception. cause C the interrupt exception occurs when one of the eight interrupt conditions is asserted. the signi?cance of these interrupts depends on the speci?c system implementation. each of the eight interrupts can be masked by clearing the corresponding bit in the interrupt mask ?eld of the status register. all eight interrupts can be masked at once by clearing the ie bit of the status register. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to int. the ip ?eld of the cause register indicates the current interrupt requests. it is possible that more than one of the bits will be set at the same time, or that no bits will be set if an interrupt is asserted and then deasserted before the cause register is read.
4-40 CW4011 exception processing the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. servicing C if the interrupt is caused by one of the two software generated exceptions, the interrupt condition is cleared by setting the corresponding cause register bit to zero. if the interrupt is hardware generated, the interrupt condition is cleared by correcting the condition causing the interrupt signal to be asserted. 4.4.6.3 external vectored interrupt exception the CW4011 implements an external vectored interrupt interface, which consists of an interrupt input (exvintn), interrupt vector virtual address input (exvap[31:2]), and interrupt accepted output (exvaen). the signals must be asserted and deasserted on the rising edge of the system clock. this interrupt class can be enabled or disabled using the evi bit in the ccc register (enabled when ccc24 = 1). this section describes the cause of and response to an external vectored interrupt exception. cause C an external vectored interrupt occurs when the exvintn is asserted. the signi?cance of this interrupt depends on the speci?c system implementation. the interrupt can be masked by clearing the ie (r3000 = iec) bit of the status register. handling C the virtual address speci?ed by the exvap[31:2] interface is used to specify the target exception handling routine. the exvap[31:2] address must be provided by a user-de?ned interrupt controller. the exvintn and exvap[31:2] inputs must be held stable and valid until the exception is accepted. this is indicated by asserting the exvaen output for one cycle. the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set as an indicator.
exception description details 4-41 servicing C the interrupt condition can be cleared in the user-de?ned interrupt controller in one of two ways: by detecting the assertion of the interrupt accepted output (exvaen), or by correcting the condition causing the interrupt pin (exvintn) to be asserted. 4.4.7 address error exception this section describes the cause of and response to an address error exception. cause C the address error exception occurs when an attempt is made to either: load, fetch, or store a word that is not aligned on a word boundary load or store a halfword that is not aligned on a halfword boundary reference the kernel address space from user mode the address error exception is not maskable. handling C the common exception vector is used for this exception. the cause register exccode is set based on the type of reference that caused the exception: adel for a data load or instruction fetch, ades for a data store operation. when the address error exception occurs, the badvaddr register retains the virtual address that was not properly aligned or that referenced protected address space. the contents of the vpn ?eld of the context and entryhi registers are unde?ned, as are the contents of the entrylo register. the epc register points at the instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the process executing at the time should be handed a segmentation violation signal. this error is usually fatal to the process incurring the exception.
4-42 CW4011 exception processing 4.4.8 tlb exceptions this subsection describes the tlb modi?ed exception, tlb invalid exception, and the tlb re?ll exception. if a speci?c design does not have a tlb, this section may be disregarded. 4.4.8.1 tlb re?ll exception this section describes the cause of and response to a tlb re?ll exception. cause C the tlb re?ll exception occurs when there is no tlb entry to match a reference to a mapped address space. this exception is not maskable. handling C a special tlb re?ll exception vector is used for this exception. the cause register exccode is set based on the type of reference that caused the exception: tlbl for a data load or instruction fetch and tlbs for a data store operation. when the tlb re?ll exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the address space identi?er (asid) from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. r4000 mode C this special exception vector is used when the exception level (at the time of tlb miss detection) is set to normal (exl = zero). if the exception level is set to exception (exl = 1), the common exception vector is used. r3000 mode C this special exception vector is used when user or kernel mode references to user memory space ( kuseg ) do not ?nd a matching entry in the tlb. if the reference is to kernel memory space ( kseg2 ), the common exception vector is used.
exception description details 4-43 servicing C to service this exception, the contents of the context register are used as a virtual address to fetch memory locations containing the physical page frame and access control bits for a tlb entry. this information is placed in the entryhi and entrylo registers and written into the tlb. it is possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the tlb. in this case, a tlb re?ll exception is allowed inside the tlb re?ll handler. while the ?rst exception goes to a special exception vector offset (0x000), the second exception goes to the common exception vector offset (0x180). the second tlb re?ll exception obscures the contents of the badvaddr, context, and entryhi registers within the tlb re?ll handler. as a result, the exact virtual address whose translation caused the ?rst fault is not known unless the tlb re?ll handler speci?cally saved this address. it is possible to observe only the failing pte virtual address. the badvaddr register now contains the original contents of the context register within the tlb re?ll handler, which is the pte address for the original failing address. the operating system can determine the original virtual page number that caused the fault, but not the complete address. the operating system uses this information to fetch the pte that contains the physical address and to access control information. it also writes the entry into the tlb and returns to the original user program. returning to the tlb re?ll handler at this point should be avoided. r4000 mode C when the exl bit is set, it prevents the epc from the ?rst tlb re?ll exception from being overwritten by the second tlb re?ll exception. consequently, the appropriate return address can be determined from the values of the current epc and the bd bit of the status register. r3000 mode C the tlb re?ll handler must save the ?rst re?ll epc and status[bd] information in a way that allows the second re?ll to ?nd it. using this saved epc register and status[bd] information, the appropriate return address can be determined.
4-44 CW4011 exception processing 4.4.8.2 tlb invalid exception this section describes the cause of and response to a tlb invalid exception. cause C the tlb invalid exception occurs when a virtual address reference matches a tlb entry that is marked invalid. this exception is not maskable. handling C the common exception vector is used for this exception. the cause register exccode is set based on the type of reference that caused the exception: tlbl for a data load or instruction fetch, tlbs for a data store operation. when the tlb invalid exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the valid bit of the tlb entry is typically cleared when: a virtual address does not exist the virtual address exists, but is not in main memory (a page fault) a trap is desired on any reference to the page (for example, to maintain a reference bit) after servicing the cause of this exception, the tlb entry is located with the tlb probe (tlbp) instruction, and replaced by an entry with the valid bit set.
exception description details 4-45 4.4.8.3 tlb modi?ed exception this section describes the cause of and response to a tlb modi?ed exception. cause C the tlb modi?ed exception occurs during a store operation, when the virtual address reference to memory matches a tlb entry that is marked valid but is not dirty or writable. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to one, indicating a tlb modi?cation exception (mod). when the tlb modi?ed exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. servicing C the kernel uses the failed virtual address and virtual page number to identify the corresponding access control information. the page identi?ed may or may not permit write access. if writes are not permitted, a write protection violation has occurred. if write access is permitted, the kernel marks the page frame as dirty/writable in the kernels own data structures. the tlbp instruction is used to place the index of the tlb entry that must be altered in the index register. the entrylo registers are loaded with physical page frame and access control bits (with the d bit set), and the entryhi and entrylo registers are written into the tlb.
4-46 CW4011 exception processing 4.4.9 bus error exception this section describes the cause of and response to a bus error exception. cause C the bus error exception occurs when signaled by board-level circuitry for events such as bus time-out, bus parity errors, and invalid physical memory accesses. this exception is not maskable. in the CW4011, bus errors are asynchronous events with respect to cpu instruction processing (much like the nmi interrupt). this means that there is no attempt to identify the instruction that was the root source of the error. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to bus. the epc register points at the ?rst instruction for which processing was not completed, unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the physical address at which the fault occurred is not available to the exception handler. the process executing at the time of the exception must be handed a bus error signal, which is usually fatal. 4.4.10 integer over?ow exception this section describes the cause of and response to an integer over?ow exception. cause C the integer over?ow exception occurs when an add, addi, sub, dadd, daddi, or dsubi instruction results in a twos complement over?ow. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to ov. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set.
exception description details 4-47 servicing C the process executing at the time of the exception should be handed an integer over?ow signal. this error is usually fatal to the current process. 4.4.11 trap exception this section describes the cause of and response to a trap exception. cause C the trap exception occurs when a tge, tgeu, tlt, tltu, teq, tne, tgei, tgeui, tlti, tltui, teqi, or tnei instruction results in a true condition. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to tr. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the process executing at the time of the exception should be handed a trap signal. this error is usually fatal. 4.4.12 system call exception this section describes the cause of and response to a system call exception. cause C the system call exception occurs when an attempt is made to execute the syscall instruction. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to sys. the epc register points at the syscall instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in the branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. servicing C when this exception occurs, control is transferred to the applicable system routine. to resume execution, the routine must restart
4-48 CW4011 exception processing instruction execution after the syscall instruction. this restart address can be computed using the epc register along with the bd and bt bits in the cause register. if (bd = 0) then restart_pc = epc + 4 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address it is up to the exception handler to obtain the branch target address from the prior branch when the syscall instruction resides in a branch delay slot. 4.4.13 breakpoint exception this section describes the cause of and response to a breakpoint exception. cause C the breakpoint exception occurs when an attempt is made to execute the break instruction. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to bp. the epc register points at the break instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. servicing C when the breakpoint exception occurs, control is transferred to the applicable system routine. additional distinctions can be made from the unused bits of the break instruction (bits [25:6]), by loading the contents of the instruction at which the epc register points. (a value of four must be added to the epc register to locate the instruction if it resides in a branch delay slot). to resume execution, the routine must start executing the instruction again after the break instruction. the restart address can be computed using the epc register along with the bd and bt bits held in the cause register. if (bd = 0) then restart_pc = epc + 4
exception description details 4-49 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address when the break instruction resides in a branch delay slot, it is up to the exception handler to obtain the branch target address from the prior branch. 4.4.14 reserved instruction exception this section describes the cause of and response to a reserved instruction exception. cause C the reserved instruction exception occurs when an attempt is made to execute an instruction whose major opcode (bits [31:26]) are unde?ned, or a special instruction whose minor opcode (bits [5:0]) are unde?ned. this exception also occurs on a regimm instruction whose minor opcode (bits [20:16]) are unde?ned. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to ri. the epc register points at the break instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the reserved instruction exception can be used to trap to emulation routines for instructions not supported in the CW4011 instruction set. once emulation has been completed, execution can be resumed using the epc register along with the bd and bt bits in the cause register. if (bd = 0) then restart_pc = epc + 4 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address when the instruction receiving a reserved instruction exception resides in a branch delay slot, it is up to the exception handler to obtain the branch target address from the prior branch.
4-50 CW4011 exception processing if there is no emulation routine, the process executing at the time of the exception should be given an illegal instruction signal. this error is usually fatal. 4.4.15 floating-point exception this section describes the cause of and response to a ?oating-point exception. cause C the floating-point exception is used by the ?oating-point coprocessor (if installed). the contents of the floating-point control status register (inside cp1) indicate the cause of the exception. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to fpe. the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C this exception is cleared by clearing the appropriate bit in the floating-point control status register. for an unimplemented instruction exception, the kernel should emulate the instruction. for other exceptions, the kernel should pass the exception to the user process that caused the exception. 4.4.16 coprocessor unusable exception this section describes the cause of and response to a coprocessor unusable exception. cause C the coprocessor unusable exception occurs when an attempt is made to execute a coprocessor instruction for either a corresponding coprocessor unit that has not been marked usable, or for cp0 instructions, when the unit has not been marked usable and the process is executing in user mode. this exception is not maskable. handling C the common exception vector is used for this exception. the exccode ?eld in the cause register is set to cpu . the contents of the
exception description details 4-51 ce ?eld in the cause register indicate the coprocessor to which an attempted reference has been made. the epc register points at the instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the coprocessor unit to which an attempted reference was made is identi?ed by the ce ?eld of the cause register. the result is one of the following: if the process is entitled to access, the coprocessor is marked usable and the corresponding user state is restored. if the process is entitled to access the coprocessor, but the coprocessor does not exist or has failed, interpretation of the coprocessor instruction is possible. if the process is not entitled to access the coprocessor, the process executing at the time should be given some sort of illegal/privileged instruction signal. this error is usually fatal. 4.4.17 debug exception this section describes the cause of and response to a debug exception. cause C the debug exception occurs when a debug condition (read/write access at breakpoint data address, read access at breakpoint program counter, trace) is detected by the cp0. the debug control and status (dcs) register speci?es which event was detected. r4000 mode C in r4000 mode, the debug exception can be masked by setting the exl bit in the status register. when this bit is set, a debug event does not cause an exception trap even if the dcs[te] bit is set to one. however, the status bits of the dcs register are updated to indicate that an event was recognized. r3000 mode C in r3000 mode, the debug exception is not maskable. handling C the debug exception vector is used for this exception in both r4000 and r3000 modes.
4-52 CW4011 exception processing the epc register points at the instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. servicing C the debug exception is a debugging aid. typically the exception handler transfers control to a debugger, allowing you to examine the situation. the debug exception condition must be disabled to execute the failing instruction and then re-enabled. notes: 1. the trace status bit (dcs5) is set whenever a branch instruction is encountered regardless of whether the branch is actually taken. however, if the debug exception trap is enabled (dcs31 = 1), an exception is recognized only if the branch is taken and the target instruction executed. 2. the program counter debug status bit (dcs1) is set whenever the target address of a branch falls within the speci?ed pc address range (bpc, bpcm) regardless of whether the branch is actually taken. however, if the debug exception trap is enabled (dcs31 = 1), an exception is recognized only if the branch is taken and the target instruction executed.
5-1 chapter 5 CW4011 memory management this chapter describes the system coprocessor (coprocessor 0) memory management functions. it contains the following sections: section 5.1, tlb physical organization section 5.2, memory management system section 5.3, virtual memory and the tlb please note that the translation lookaside buffer (tlb) is an optional module for the CW4011. if a speci?c design does not contain a tlb, any tlb references in this chapter may be ignored. 5.1 tlb physical organization the physical implementation of the tlb consists of two main parts: 1. a two-entry instruction tlb (itlb) 2. a 32-entry joint tlb (jtlb) that holds both instruction fetch and data access page translations the cp0 can receive virtual address translation requests from both the isu (instruction fetch) and the lsu (operand data access) during the same cycle. for maximum performance, address translations must occur in parallel. the two-piece tlb structure shown in figure 5.1 addresses this problem by creating a separate two-entry tlb to be used for instruction fetch translations. with this structure, isu and lsu fetches can be independently processed.
5-2 CW4011 memory management figure 5.1 tlb block diagram the itlb holds the two most recently used instruction fetch page translations. if a valid translation cannot be found in the itlb, the cp0 must stall the pipeline for two cycles and search the jtlb for a valid entry. if the cp0 ?nds a valid entry in the jtlb, it copies it into the less recently used itlb entry and processing continues. if a valid entry cannot be found, a tlb exception must be posted (see chapter 4, CW4011 exception processing, for details.) the entries in the itlb are purged when the entryhi register is written (for example, during a task switch). consequently, the itlb does not need to keep an eight-bit asid for each entry. this reduces storage and match circuitry. this simpli?cation should cause little or no performance penalty, because the entries probably need to be replaced anyway. when no tlb is present in the system, the te ?eld of the con?guration and cache control (ccc) register is cleared to zero. this is transparent to the other modules in the CW4011 core. the cp0 modi?es its translation behavior in the following manner: isu lsu itlb jtlb vadr[31:12] padr[31:12] vadr[31:12] padr[31:12] stall pipe itlb miss vadr[31:12] tlb entry cp0 stall pipe (2 entries) (32 entries) mmu vadr = virtual address padr = physical address
tlb physical organization 5-3 physical address[31:12] = virtual address[31:12]. for kseg0 and kseg1 , physical address [31:29] = 0; the same is true with tlb present. the caching algorithm used for each access is based on the address segment being accessed ( kuseg , kseg0 , and kseg2 = cached; kseg1 = uncached), and the ccc register ?elds (ie0, ie1, de0, de1, and wb). table 5.1 shows the algorithm criteria for the i-cache and table 5.2 lists criteria for the d-cache. table 5.1 i-cache algorithm criteria address segment i-cache enabled ifetch cache algorithm kuseg , kseg0 ,or kseg2 0 uncached 1 cached kseg1 x uncached table 5.2 d-cache algorithm criteria address segment d-cache enabled wb d-cache algorithm kuseg , kseg0 ,or kseg2 0 x uncached 1 0 cached, writethrough 1 1 cached, writeback kseg1 x x uncached
5-4 CW4011 memory management 5.2 memory management system the memory model used for the CW4011 processor is based on the r3000. to extend the cpus address space, the virtual memory translates addresses composed in a large virtual address space into the physical memory system. the CW4011 physical address space is four gbytes and uses a 32-bit address. the virtual address is also 32 bits wide, and the maximum user process size is two gbytes (2 31 ). the virtual address is extended with an asid to reduce the frequency of the tlb ?ushing when switching context. the size of the asid is 8 bits. the asid is contained in the cp0 entryhi register and is described in the subsection entitled entryhi register on page 5-10 . 5.2.1 operating modes this section describes the two modes for 32-bit CW4011 operation: user mode, where nonsupervisory programs are executed kernel mode, which is analogous to the supervisory mode provided by many machines the CW4011 usually operates in user mode until an exception forces it into kernel mode. it remains in kernel mode until a restore from exception instruction (r3000 mode), or exception return (r4000 mode) instruction is executed to restore the processor to the mode existing prior to the exception. address mapping is different for kernel and user modes. to simplify the management of user state from within the kernel, the user-mode address space is a subset of the kernel-mode address space. figure 5.2 shows the virtual-to-physical memory map for both the user mode and kernel mode segments.
memory management system 5-5 figure 5.2 CW4011 virtual memory map 5.2.2 user mode virtual addressing in user mode, a single, uniform virtual address space ( kuseg )of two gbytes (2 31 bytes) is available. the user segment starts at address 0x00000000, and all valid accesses have the most-signi?cant bit cleared to zero. referencing an address with the most signi?cant bit set while in user mode causes an address error exception. the tlb maps all references to kuseg identically for either mode, and controls cache accessibility. kuseg is typically used to hold user code and data, as well as the current user process. the processor state de?nition of user and kernel modes description can be found in section 4.3.6, status register. ffff ffff c000 0000 bfff ffff a000 0000 9fff fff 8000 0000 7fff ffff 0000 0000 kernel unmapped uncached kernel unmapped cached kernel mapped cacheable user mapped cacheable kuseg kseg0 kseg1 kseg2 0000 0000 1fff ffff 2000 0000 memory ffff ffff (4 gbytes) 512 mbytes 512 mbytes virtual physical any any
5-6 CW4011 memory management 5.2.3 kernel mode virtual addressing as shown in figure 5.2 , the virtual address space is divided into regions, differentiated by the high-order bits of the address: 5.3 virtual memory and the tlb mapped virtual addresses are translated into physical addresses using an on-chip tlb. the tlb is a fully-associative memory that holds 32 entries that provide mapping to 32 physical page frames. the address range mapped by a page can be either 4 kbytes or 16 mbytes in size. when address mapping is indicated, each tlb entry is simultaneously matched against the virtual address extended by the current asid stored in the entryhi register. if there is a match (hit), the physical page number is extracted from the tlb and concatenated with the offset to form the physical address, as shown in figure 5.3 . kuseg starts at virtual address 0x00000000 and is 2 gbytes long. it allows selective caching and mapping on a per-page basis, rather than requir- ing an all or nothing approach. this segment overlaps kernel memory accesses with user memory accesses as described previously. kseg0 starts at virtual address 0x80000000 and is 512 mbytes long. CW4011 direct maps references within kseg0 onto the ?rst 512 mbytes of phys- ical memory. these references use cache memory, but do not use the tlb for address translation. thus, kseg0 is typically used for kernel executable code and some kernel data. kseg1 starts at virtual address 0xa0000000 and is 512 mbytes long. CW4011 direct maps references within kseg1 onto the ?rst 512 mbytes of phys- ical memory. these references do not use cache memory or the tlb for address translation. thus, kseg1 is typically used by operating sys- tems for i/o registers, rom code and disk buffers. kseg2 starts at virtual address 0xc0000000 and is 1024 mbytes long. like kuseg , it uses tlb entries to map virtual addresses to arbitrary phys- ical ones, with or without caching. an operating system typically uses kseg2 for stacks and per-process data that must remap on context switches. the operating system also uses kseg2 for user page tables and some dynamically allocated data areas.
virtual memory and the tlb 5-7 figure 5.3 CW4011 virtual address format if no match occurs (a page miss), an exception is taken. typically, software re?lls the tlb from a page table maintained by the system. software can write over a selected tlb entry or use a hardware mechanism to write into a random location. the CW4011 does not support the tlb-shutdown (ts) bit in the status register, which indicates that more than one entry in the tlb matches the virtual address being translated. if more than one tlb entry matches the virtual address, the virtual address may be translated to an incorrect physical address. system software must ensure that this situation is never created . 5.3.1 tlb entry format figure 5.4 shows the 32-bit addressing tlb entry format the CW4011 uses. each ?eld of an entry has a corresponding ?eld in the entryhi, entrylo, or pagemask registers described in sections 5.3.2.1 , 5.3.2.2 , and 5.3.2.3 . offset offset asid asid 0 0 31 31 39 32 28 11 12 812 17 88 24 offset passed unchanged offset passed unchanged virtual to physical translation bits 31:29 select user or kernel address spaces virtual address virtual address 0 39 32 24 23 32-bit physical address vpn vpn with 4-kbyte page size with 16-mbyte page size virtual to physical translation 29 3 31 29 28
5-8 CW4011 memory management figure 5.4 format of CW4011 tlb entry r reserved [95:78, 76:64, 43:40, 31:26] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. m mask 77 this bit is the page mask bit. it is set to one for a 16-mbyte page and cleared to zero for a 4-kbyte page. vpn virtual page number [63:44] this ?eld contains the virtual page number. asid address space id field [39:32] this ?eld contains the address space id. pfn page frame number [25:6] this ?eld contains the page frame number. this is the upper bits of the physical address. c cache [5:3] this ?eld contains the cache algorithm, which speci?es whether references to the page should be cached. if the references are to be cached, you can select one of two 95 78 77 76 64 63 31 m rr r vpn r asid pfn d cvg 18 44 13 20 43 40 39 32 48 1 6 26 25 20 65 2 1 0 3111 3
virtual memory and the tlb 5-9 algorithms: writeback or writethrough. the following table shows how the cache bits are decoded. d dirty 2 if this bit is set to one, it indicates that the page marked is dirty and writable. v valid 1 if this bit is set to one, it indicates that the tlb entry is valid. g global 0 if this bit is set to one, the contents of the asid ?eld are ignored during tlb lookup. 5.3.2 tlb support registers table 5.3 lists the tlb registers used in association with the cp0 tlb. cbit settings value algorithm 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 reserved reserved uncached cacheablewritethrough reserved reserved reserved cacheablewriteback table 5.3 tlb support registers name cp0 register number reference page entryhi register 10 5-10 entrylo register 2 5-11 pagemask register 5 5-12 index register 0 5-13 random register 1 5-13 wired register 6 5-14
5-10 CW4011 memory management 5.3.2.1 entryhi register the entryhi register is a read/write register used to access the tlb. in addition, this register contains the current asid value for the processor. the asid value is used to match the virtual address with a tlb entry during virtual address translation. typically, the operating system assigns a unique asid value to each known process. in this way, mappings held in the tlb are made unique to the process whose asid they match. the entryhi register holds the high-order bits of a tlb entry when performing tlb read and write operations. when either a tlb re?ll, tlb invalid, or tlb modi?ed exception occurs, the entryhi register is loaded with the virtual page number (vpn) and the asid of the virtual address that failed to have a matching tlb entry. entryhi is accessed by the tlbp, tlbw, tlbwi, and tlbr instructions. figure 5.5 shows the format of this register. figure 5.5 entryhi register vpn virtual page number [31:12] this ?eld contains the virtual page number. r reserved [11:8] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future versions of the software. asid address space id [7:0] this ?eld contains the address space id. 31 12 11 8 7 0 vpn r asid
virtual memory and the tlb 5-11 5.3.2.2 entrylo register the entrylo register is a read/write register used to access the tlb. when performing read and write operations, the register contains a physical page frame number, cache algorithm, page dirty, translation valid, and global entry information. figure 5.6 shows the format of this register. figure 5.6 entrylo register r reserved [31:26] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future versions of the software. pfn physical page frame number [25:6] this ?eld contains the physical page frame number. c cache [5:3] this ?eld contains the cache algorithm, which speci?es whether references to the page should be cached. if the references are to be cached, you can select one of two algorithms: writeback or writethrough. the following table shows how the cache bits are decoded. d dirty 2 if this bit is set to one, it indicates that the marked page is dirty and writable. 31 26 25 6 5 3 2 1 0 r pfn c d v g cbit settings value algorithm 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 reserved reserved uncached cacheablewritethrough reserved reserved reserved cacheablewriteback
5-12 CW4011 memory management v valid 1 if this bit is set to one, it indicates that the tlb entry is valid. g global 0 if this bit is set to one, the contents of the asid ?eld are ignored during tlb lookup. mapping is globally available to all asids. 5.3.2.3 pagemask register the pagemask register is a read/write register used to access the tlb. it implements a variable page size by holding a per-entry comparison mask. when virtual addresses are presented for translation, the corresponding pagemask bit in the tlb speci?es whether or not virtual address bits [23:12] participate in the comparison. figure 5.7 shows the format of the pagemask register. figure 5.7 pagemask register r reserved [31:14, 12:0] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. m mask 13 this ?eld contains the pagemask. the following table shows the page size and the physical and virtual address bits for each setting of the mask bit. 31 14 13 12 0 rmr mask bit page size physical address virtual address 1 16 mbytes pfn[31:24] [23:0] 0 4 kbytes pfn[31:12] [11:0]
virtual memory and the tlb 5-13 5.3.2.4 index register the index register is a 32-bit, read/write register containing ?ve bits that are used to index an entry in the tlb. the high-order bit indicates the success or failure of a tlb probe (tlbp) instruction. the index register also speci?es the tlb entry that is affected by the tlb read (tlbr) and tlb write index (tlbwi) instructions. figure 5.8 shows the format of the index register. figure 5.8 index register p probe 31 if this bit is set to one, it indicates that the last tlbp instruction failed to ?nd a match. r reserved [30:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure com- patibility with future versions of the software. index index [4:0] this ?eld contains the index to the tlb entry. the tlbr and tlbwi instructions use this index. 5.3.2.5 random register the random register is a 32-bit read-only register that contains ?ve bits that are used to index an entry in the tlb. the register decrements for each clock cycle. the values range between a lower bound set by the number of tlb entries reserved for exclusive use by the operating system (de?ned in the wired register), and an upper bound set by the total number of tlb entries (32 maximum). the random register speci?es the entry in the tlb affected by the tlb write random (tlbwr) instruction. the register does not need to be read for this purpose, but the register can be read to verify proper operation. 31 30 54 0 p r index
5-14 CW4011 memory management to simplify testing, the random register is set to the value of the upper bound when the system is reset. it is also set to its upper bound when the wired register is written. the format of this register is shown in figure 5.9 . figure 5.9 random register r reserved [31:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. random random [4:0] this ?eld contains the index to the tlb entry affected by the tlbwr instruction. 5.3.2.6 wired register the wired register is a read/write register that speci?es the boundary between the wired (?xed, nonreplaceable entries that cannot be over- written by a tlbwr operation) and random entries of the tlb. figure 5.10 shows the location in the tlb of the wired register. figure 5.10 wired register location 31 54 0 r random tlb 31 0 wired register range of random entries range of wired entries
virtual memory and the tlb 5-15 when the system is reset, the wired register is set to zero. writing the register also sets the random register to the value of its upper bound. figure 5.11 shows the format of the wired register. figure 5.11 wired register r reserved [31:5] these bits are not used and are read as zero. the CW4011 ignores attempts to set these bits; however, software should write these bits as zero to ensure compatibility with future versions of the software. wired wired [4:0] this ?eld de?nes the lower boundary of random tlb entries. 5.3.3 virtual address translation during virtual-to-physical address translation, the cp0 compares the asid and the highest 7 to 20 bits of the virtual address to the contents of the tlb. the virtual address bits compared with the asid depend on the page size. figure 5.12 illustrates the tlb address translation process. 31 54 0 r wired
5-16 CW4011 memory management figure 5.12 CW4011 tlb address translation process a virtual address matches a tlb entry under one of two conditions: the vpn ?eld of the virtual address equals the vpn ?eld of the entry and the g bit of the tlb entry is set. the asid held in the entryhi register matches the asid ?eld in the tlb entry. ye s ye s no no ye s ye s ye s ye s no no ye s ye s ye s no no no no no output physical address indicates an exception input virtual address user mode? msb=1? address error vpn match? g = 1? asid match? v = 1? write? d = 1? tlb mod c = 010? access main memory access cache tlb invalid tlb re?ll bits g, v, d, and c are bits in the tlb entry.
virtual memory and the tlb 5-17 although the v bit of the tlb entry must be set for a valid translation to take place, it is not involved in determining a matching tlb entry. if a tlb entry matches, the physical address and access control bits (c, d, and v) are retrieved from the entry. if no match is found, a tlb miss exception occurs. if the access control bits (d and v) indicate that the access is not valid, a tlb modi?cation or tlb invalid exception occurs, respectively. if the c bits equal 0b010, the physical address that is retrieved is used to access main memory, bypassing the cache. 5.3.4 tlb instructions table 5.4 lists the instructions that the CW4011 provides for working with the tlb. notes: 1. if the tlb is not present or not enabled in the system, the cp0 re?ects a coprocessor unusable exception if an attempt is made to execute any of the tlb instructions. 2. tlb instructions (tlbp, tlbr, tlbwi, and tlbwr) cannot be immediately preceded or followed by a data load instruction that requires target address translation (that is, kuseg and kseg2 ). table 5.4 tlb instruction instruction description tlb probe (tlbp) the index register is loaded with the address of the tlb entry whose contents match the contents of the entryhi register. if no tlb entry matches, the high- est order bit of the index register is set. results are unde?ned if a tlb reference encounters more than one matching tlb entry. tlb read (tlbr) this instruction loads the entryhi, entrylo, and pagemask registers with the contents of the tlb entry speci?ed by the index register. tlb write index (tlbwi) this instruction loads the tlb entry speci?ed by the index register with the contents of the entryhi, entrylo, and pagemask registers. tlb write random (tlbwr) this instruction loads the tlb entry speci?ed by the random register with the contents of the entryhi, entrylo, and pagemask registers.
5-18 CW4011 memory management 3. the instruction prior to a tlbw instruction must not generate an exception. you are recommended to use an nop to make sure this restriction is met. 4. three instructions are needed between mtc0 (entryhi, entrylo, pagemask, or index) and subsequent tlbwi or tlbwr instructions to properly re?ect the mtc0 operation.
6-1 chapter 6 CW4011 caches this chapter describes the CW4011 caches and cache maintenance. it contains the following sections: section 6.1, cache memory organization section 6.2, cache states section 6.3, address and cache tag section 6.4, cache scratchpad ram mode section 6.5, external invalidation section 6.6, cache instructions 6.1 cache memory organization the CW4011 has separate caches for instructions and data: the i-cache and d-cache. the CW4011 i-cache and d-cache are organized as follows: 1. the i-cache and d-cache can be organized as direct-mapped or two- way set associative caches. a least recently used (lru) algorithm is used in two-way set associative cache replacement for the i-cache; the d-cache uses a random algorithm for the same. 2. the cache controllers support con?gurations of 1, 2, 4 or 8 kbytes for each set. thus, the smallest supported con?guration is a 1-kbyte direct-mapped cache, and the largest is a 16-kbytes two-way set associative cache, with 8 kbytes per set. 3. the caches are indexed with a virtual address. 4. they are tagged with a physical address tag.
6-2 CW4011 caches 5. one cache line consists of 8 words (or four doublewords) with a single word containing four 8-bit bytes. re?ll address ordering is wrap-around from the missing address. 6. the d-cache supports both writeback and writethrough modes. if the system has no memory management unit (mmu), the wb bit in the ccc register de?nes the mode for all cacheable regions of memory. when the wb bit is set to zero, the mode is writethrough. when it is set to one, the mode is writeback. if the system has an mmu, the translation lookaside buffer (tlb) entry determines the mode on a per-page basis. 7. scratchpad ram mode is available; it works similarly to the scratchpad ram in the lr33300. this is discussed in more detail in section 6.4, cache scratchpad ram mode. 6.2 cache states this section describes cache states for the i-cache, writethrough d-cache, and writeback d-cache. 6.2.1 i-cache and writethrough d-cache the i-cache and d-cache (when operating in writethrough mode) require only two states: invalid and valid clean. initialization sets all cache lines to the invalid state. this is done using the cache invalidate mode described in section 6.6.3, cache maintenance by ccc register, or the cache flush instructions described in section 6.6.1, flush (all cache invalidation). the ?rst time a cache line is re?lled because of a cache miss, its state goes from invalid to valid clean. the cache remains in the valid clean state until it is forced back to invalid. this occurs in one of the following events: an external invalidate execution of a cache ?ush instruction the v bit of each cache line indicates the cache state. v = 0 is invalid and v = 1 is valid clean. figure 6.1 shows the state diagram for i-cache and writethrough d-cache.
cache states 6-3 figure 6.1 cache state diagrami-cache and writethrough d-cache 6.2.2 writeback d-cache when the d-cache operates in writeback mode, three cache line states are required: invalid, valid clean, and valid dirty. figure 6.2 shows the state diagram for writeback d-cache. the v bit and wb bit of each line indicate the state, as shown in table 6.1 . figure 6.2 cache state diagramd-cache writeback valid clean invalid load-miss, then re?ll invalidation load-misrecital load-hit store-miss store-hit table 6.1 d-cache writeback mode state v bit wb bit condition invalid 0 x(0) the cache line does not contain valid information. valid clean 1 0 the cache line contains valid information consistent with memory. valid dirty 1 1 the cache line contains valid information, but it is not consistent with memory. valid clean load-miss, re?ll load-hit store-miss load-miss, then re?ll invalidation store-hit load-miss, then writeback and re?ll invalidation store-hit valid dirty invalid load-hit store-miss
6-4 CW4011 caches a store operation is considered to be a d-cache hit when the tag is coincident with the physical address and the v bit is set. of course, the physical address must be in a cached area. when a store-miss occurs, the state condition of the cache line is not changed, and the store data is not written into d-cache. instead, the store data is written to the four-word-deep write buffers, which pass it to the systems main memory. some lines, known as dirty lines, contain more recent information than the main memory. occasionally you may need to force the writing of dirty lines to main memory. you can do this using the writeback cache instruction. in writeback mode, data stored in the d-cache may not be passed on to the external write controller immediately. because of this, the writeback cache instruction writes back each line of both sets in a two-way set associative con?guration. the instruction does not check whether the address speci?ed by the instruction would hit or miss at the cache line to which it pages. if the wb bit is set, the line data is written back and causes several stall cycles to read data from the d-cache. the actual number of stall cycles depends on the speed of memory access. cache lines can be invalidated by an external bus master. a cache line is invalidated when the invalidate address matches the cache tag id, and the cache invalidation signal(s) are asserted.
address and cache tag 6-5 6.3 address and cache tag figure 6.3 illustrates the relationship between instruction and data address and cache memory location, for both direct map and two-way set associative cache con?gurations. the word offset ?eld addresses a word in a line. the line number ?eld addresses a line in the cache memory. the cache tag id ?eld serves as the tag for the address line. figure 6.3 address to cache tag and line number if the system has an mmu, the cache access is indexed by the virtual address and tagged by the physical address. because the minimum memory page size is 4 kbytes, there is no virtual/physical address issue if the cache set size is 4 kbytes or less. if the cache set size is 8 kbytes and the page size is 4 kbytes, address bit 12 of the virtual address must be coincident with address bit 12 of the physical address. table 6.2 shows how the value of n determines different cache sizes. 31 9+n 8+n 5 4 2 1 0 cache tag id line number word offset r md96.91 table 6.2 setting cache size setting cache size (kbytes) value of n 11 22 43 84
6-6 CW4011 caches 6.4 cache scratchpad ram mode both CW4011 d-cache sets and i-cache set 1 can be con?gured as a scratchpad ram. this is accomplished by setting the sr0, sr1, or isr1 bits in the ccc register, as shown in table 6.3 . a scratchpad ram must be located in one speci?c physical address space like local data memory. if the CW4011 asic device has d-cache or i-cache tag rams present, the tag contents must be programmed before enabling scratchpad mode by setting the ccc register bits as follows: set isc to one set tag to one clear inv to zero set de0, de1, or ie1 to one (depending on which cache sets are to be placed in scratchpad mode) also, the instructions must be written into the instruction data ram of i-cache set 1 before the CW4011 attempts to fetch an instruction from this ram. this is because instructions cannot be written to the instruction data ram while scratchpad mode is enabled. if a d-cache or i-cache ram is only as a scratchpad ram in an asic design, the cache tag rams can be physically removed from the device to save costs. in such a case, the d-cache tag inputs of the core must be set either high or low, according to the address of the scratchpad ram area. the necessary isr1, sr0, and sr1 bits should all be always set to one. table 6.3 scratchpad ram enables ccc register bit setting scratchpad mode enabled sr0 d-cache set 0 sr1 d-cache set 1 isr1 i-cache set 1
external invalidation 6-7 when a cache scratchpad ram is enabled, any accesses to the scratchpad ram area are treated as local memory accesses without any stall cycles. 6.5 external invalidation i-cache and d-cache lines can be invalidated by external hardware for bus snooping. the CW4011 has an invalidate strobe and invalidate address bus input. writeback by external hardware is not supported. details are described in chapter 7 , CW4011 signals . 6.6 cache instructions the CW4011 has two types of cache instructions for initialization and writeback. the cache instruction must be followed by three nop instructions. figure 6.4 shows the cache instruction format. figure 6.4 cache instruction format 1 0 1 1 1 1 cache 31 26 25 21 op 20 16 15 0 cache op, offset(base) 0 0 0 0 0 valid for wb only base offset bit[20:18] 000 flush (all cache invalidation) 001 writeback (d-cache only) bit17 d-cache effect (1)/non-effect (0) bit16 l-cache effect (1)/non-effect (0) flushi (op = 00001) flush l-cache flushd (op = 00010) flush d-cache flushid (op = 00011) flush i-cache and d-cache wb, offset(base) (op = 00100) writeback d-cache addressed by offset+[r0]
6-8 CW4011 caches 6.6.1 flush (all cache invalidation) one execution of a cache instruction can invalidate all lines of the d-cache, the i-cache, or both. bit 17 of the instruction de?nes effect and non-effect for the d-cache, and bit 16 de?nes effect and non-effect for the i-cache. if both bits are zero, this is a no operation (nop), and the base register and the offset have no meaning. one cache line of one or more cache sets is invalidated during one clock cycle. invalidation starts from the wb stage of the execution pipeline, and the pipeline stall request signal is asserted during the time that the cache lines are invalidated. if the pipeline cancel signal is asserted, the invalidation is not executed. the number of the invalidation clock cycles is always 256, regardless of the cache size actually implemented. during this time, the cpu does not respond to interrupts. 6.6.2 writeback writeback is effective for the d-cache only, so bits 17 and 16 are ignored. bits [12:5] of the effective address, which is offset+gpr[base] , specify the d-cache line. cache size is also a factor. for example, if the cache size is a 1- kbyte direct-mapped or 2-kbyte two-way set associative, only bits [9:5] are used and the upper bits of the effective address are ignored. note that the tag is not checked. for more information on cache sizing, see appendix b , cache sizing and design concerns . one writeback instruction writes back both lines of the two-way set associative cache if the wb bit is set. if wb is cleared, there is no operation. wb is executed at the wb stage and causes four stall cycles to read data from a dirty line. wb bits are cleared after the cache lines are written back.
cache instructions 6-9 6.6.3 cache maintenance by ccc register certain ccc register bits support d-cache and i-cache maintenance and testing. table 6.4 lists the bits of the ccc register related to the cache. the CW4011 has three maintenance modes that allow you to maintain and test the internal i-cache and d-cache. the three modes are data test, tag test, and invalidate. before entering any of these modes, the processor must be executing in kseg1 (noncacheable address space0 ), interrupts must be disabled, and the caches must be isolated (iscbit = 1). when the caches are isolated, load and store instructions access the i-cache and d-cache. the systems external main memory is not affected by these load and store accesses. to enable the cache maintenance mode, use the following procedure: 1. set the appropriate bits in the ccc register with iscbit = 1. the mtc0 instruction can easily set these bits. the three instructions table 6.4 ccc bits related to cache con?guration bit(s) function ie0 i-cache set 0 enable ie1 i-cache set 1 enable is[1:0] i-cache set size (1, 2, 4, 8 kbytes) isr1 i-cache scratchpad ram enable de0 d-cache set 0 enable de1 d-cache set 1 enable ds[1:0] d-cache set size (1, 2, 4, 8 kbytes) wb d-cache writeback/writethrough sr0 d-cache set 0 scratchpad ram enable sr1 d-cache set 1 scratchpad ram enable isc d-cache/i-cache isolate cache mode enable tag d-cache/i-cache tag test mode enable inv d-cache/i-cache invalidate mode enable
6-10 CW4011 caches immediately following the mtc0 instruction should not be load or store instructions. the ie0, ie1, de0, and de1 bits in the ccc register select the cache set that is to be accessed, as shown in table 6.5 . only one cache set should be enabled when performing a load operation. multiple caches may be enabled when performing a store operation. the tag and inv bits in the ccc register select the cache maintenance function. table 6.6 shows the encoding for the two bits. 2. clear the ie bit in the status register to disable all interrupts. this operation is usually done automatically because cache maintenance operations are done in an exception handler (most commonly the reset handler). C data test mode in this mode, all loads and stores access the data rams selected by ie0, ie1, de0, and de1 bits. effective lower address bits specify the cache address. the precise bit ?eld depends on the cache size and con?guration actually implemented. table 6.5 tag and inv encoding bit set bit number cache set accessed ie0 17 i-cache set 0 ie1 16 i-cache set 1 de0 13 d-cache set 0 de1 12 d-cache set 1 table 6.6 tag and inv encoding tag bit 1 inv bit 0 cache maintenance mode 0 0 data test 1 0 tag test x 1 invalidate
cache instructions 6-11 C tag test mode when tag bit is set to one, the CW4011 is in tag test mode. load and store operations access the tag rams. the tag bits available for testing in the tag test mode are the tag data, hit, writeback (d-cache only), and valid bits. note that the writeback bit is present only in d-cache. the hit bit is ignored during a store operation. for a load operation, the hit bit is set if a match occurs. the cache tag id bits are written from or compared to the most signi?cant bits of the effective address (offset + gpr[base]) . a load operation from the tag ram returns the information shown in figure 6.5 . bits [31:10] are the tag data; bit 2 is the hit bit; bit 1 is the validate bit which re?ects the setting of the inv bit in the ccc register; bit 0 is the writeback bit, which re?ects the setting of the wb bit in the ccc register. you can ignore bits [9:3]. figure 6.5 tag test mode loaded data format C invalidate mode when the inv bit in the ccc register is set to one, the CW4011 is in invalidate mode. because the caches contain random data on both warm and cold starts, software must invalidate all lines in the i-cache and d-cache. executing store word instructions invalidates the addressed cache line in the enabled cache(s). after reset, zero must be written into all tag s for both sets of d-cache and i-cache. cache flush instructions can be used for the same purpose. 31 10 9 3 2 1 0 tag data x hit v wb md96.93
6-12 CW4011 caches
7-1 chapter 7 CW4011 signals this chapter describes the CW4011 core i/o signals. you will ?nd this chapter useful if you are interfacing the CW4011 with other core logic or external logic. this chapter contains the following sections: section 7.1, CW4011 core signal interfaces section 7.2, control interface section 7.3, scbus interface section 7.4, ocabus interface section 7.5, coprocessor interface section 7.6, cache invalidation interface section 7.7, data cache interface section 7.8, instruction cache interface section 7.9, writeback buffer interface section 7.10, memory management unit (mmu) interface section 7.11, mmu to shell interface section 7.12, multiply/divide unit (mdu) interface section 7.13, miscellaneous signals the following signal conventions are used in this chapter: active-low signals have a lowercase n at the end of the signal name (for example, resetn). active-high signals have a lowercase p at the end of the signal name (for example, scaop). please note that some of the mdu signals do not follow this convention. the term assert means to drive a signal true or active. the term deassert means to drive a signal false or inactive.
7-2 CW4011 signals you can use the CW4011 core in a variety of designs and with a variety of peripheral logic. for this reason, it is not always possible to identify the agent that asserts and deasserts the i/o signals. the signal descriptions in this manual indicate the states to which the cores i/o signals must be driven. you may then select the design components needed to meet the signal requirements of the core. all interface signals are input to or output from the CW4011 core. all input signals must be synchronized to the rising edge of the system clock outside the CW4011. asynchronous signals, such as resets or interrupts, must be synchronized by at least two sequential ?ip?ops. all output signals are synchronized to the rising edge of the system clock inside the CW4011. 7.1 CW4011 core signal interfaces the core interface signals are divided into the following eleven categories: 1. control signals, which interface to the cp0 2. scbus signals, which interface to the biu 3. ocabus signals, which interface with the lsu 4. coprocessor signals, which interface to the isu and lsu 5. cache invalidation signals, which interface to the isu and lsu 6. data cache signals, which interface to the lsu 7. instruction cache signals, which interface to the isu 8. writeback buffer signals, which interface to the lsu 9. memory management unit signals, which interface with the cp0 10. multiply/divide unit (mdu) signals, which interface with the alu 11. miscellaneous signals, such as system clock input and endian input figure 7.1 illustrates the interface signal interconnections for the CW4011.
CW4011 core signal interfaces 7-3 figure 7.1 core interface connections figure 7.2 shows the CW4011 core interface signals arranged in functional groups. control interface scbus writeback buffer coprocessor interface cache invalidation CW4011 core interface ocabus interface CW4011 building blocks shell alu isu cp0 mmu d-cache set 0 d-cache set 1 lsu biu i-cache set 0 i-cache set 1 mdu interface
7-4 CW4011 signals figure 7.2 CW4011 logic diagram cache invalidation interface CW4011 core cresetn wresetn nmin exintn[5:0] exvintn exvap[31:2] exvapen cpbusyn[3:1] cpsreqn[3:1] cpcondp[3:0] fpeoddn fperrxn cprstn[3:1] cpcodep[31:0] cpxstbn[3:1] cpxoddn pstalln pcancrn pcanoddn brlikfn suspexn cpfrcdp[31:0] cptocen cptocdp[31:0] cpfrcen cpmissn cpfixupn scdoen scdop[63:0] schgtn scifetn sclockn sctben[7:0] sctbln sctbstn sctpwn sctssn scb32n scberrn scbpwan scbrdyn scbrtyn scdip[63:0] schrqn sctsen scaoen scaop[31:0] cpfrcdp[31:0] ocacceptp cpsreqn3 dvaddrp[31:0] accsizep[1:0] cptocdp[31:0] cptocen cpfrcen exloadp accstorep crvalidp pstalln control coprocessor interface scbus interface ocabus interface sctrqn scbgep cinvap[31:5] icinvsn dcinvsn interface
CW4011 core signal interfaces 7-5 figure 7.2 CW4011 logic diagram (cont.) CW4011 core dctagaddrp[12:5] dctagdip[23:0] dc0tagdop[23:0] dc0tagwep[1:0] dc1tagdop[23:0] dc1tagwep[1:0] dcadatadip[31:0] dcadatadop[31:0] dcadatawep[3:0] dcbdatadip[31:0] dcbdatadop[31:0] dcbdatawep[3:0] dcdataaddrp[12:3] dcadataddrp dcbdataaddrp data cache interface itfenstp ictagdip[22:0] ictagrdp ic0tagdop[22:0] ic0tagwep ic1tagdop[22:0] ic1tagwep itaddrp[7:0] icaddrp[9:0] icdatadip[63:0] icdatardp icfenstp ic0datadop[63:0] ic0datawehp ic0datawelp ic1datadop[63:0] ic1datawehp ic1datawelp maincycp iclrurdp iclrurep iclruwdp iclruwep lruaddrp[7:0] lrufenstp instruction cache interface realwbp wbbrap[1:0] wbbrdp[63:0] wbbselabp wbbwap[1:0] wbbwep bendn revlop[3:0] scanreqp sclkp miscellaneous signals implop[3:0] testmp writeback buffer interface
7-6 CW4011 signals figure 7.2 CW4011 logic diagram (cont.) CW4011 core hilobusyp inste[31:0] insto[31:0] mdbusyp mdresp[31:0] pstallp realmaddp realmultp regs0[31:0] regs1[31:0] regt0[31:0] regt1[31:0] resetn u1eenstp u1oenstp multiply/divid e unit interface ifnseqp ipaddrp[31:12] isustallp iuncachep ivaddrp[31:0] ivaddrvp lsustallp mmudataop[31:0] mmuenwrp mmuregsp[3:0] mmustallp pcancelp realmmup selqnsp stpeftchp tagmtch0p tagmtch1p tlbmissp tlbmodp tlbpp tlbrp tlbwip tlbwrp vldtlbxp memory management unit interface badvpnp[31:12] cachewbp cfgdsp[1:0] cfgiscp cfgtep cfgwbp cpzexensp cpzqensp cpzrdensp cpzstallp dattlbxp dpaddrp[31:12] duncachep dvaddrp[31:0] dvaddrvp ifetlbxp accstorep memory management unit interface (continued) dc0tagdop[23:0] dc1tagdop[23:0] cresetn wresetn cpsreqn[3:1] mmudataip[31:0] suspexn suspexn testmp testmp brlikfn pcancelp pcanoddn
control interface 7-7 7.2 control interface this section describes the reset and interrupt signals that interface to the cp0. cresetn cold system reset input asserting this signal asynchronously resets the CW4011 by initializing all internal states. cresetn has the highest priority of all the exception inputs, and must be deasserted synchronously on the rising edge of sclkp. when it is deasserted, the cp0 generates a cold reset exception (0xbfc00000). exvap[31:2] external vectored interrupt address input input these signals are the interrupt vector address. they are accepted by the CW4011 when exvapen is asserted, and are written directly into the program counter. exvap[31:2] must remain stable until the exvapen signal is deasserted. exvapen exvap enable output the CW4011 asserts exvapen to enable the interrupt vector address signals (exvap[31:2]), and deasserts exvapen to disable the address signals. exintn[5:0] external interrupts input external logic asserting an exintn[5:0] signal causes the cp0 to generate an interrupt exception. assertion of these inputs is indicated in the ip[7:2] ?eld of the cause register. consequently, the interrupting logic should continue to assert the external interrupt input until the exception routine has serviced the interrupt. the interrupt inputs can be individually disabled or masked by setting the appropriate bits in the status register. external interrupts are not recognized if the interrupt enable bit in the status register is cleared. how- ever, the input conditions are re?ected in the ip bits of the status register. see section 4.3.6, status register, for more information.
7-8 CW4011 signals exvintn external vectored interrupt input input exvintn is an external interrupt input that is driven by an external interrupt controller. see section 4.4.6.3, exter- nal vectored interrupt exception, for further information. nmin nonmaskable interrupt input nmin is a nonmaskable interrupt. when the CW4011 detects that nmin is asserted, the cp0 generates a nonmaskable interrupt exception (0xbfc00000). wresetn warm system reset input to perform a warm reset, wresetn must be asserted and then deasserted synchronously on the rising edge of the sclkp. while asserted, internal states are initialized; when deasserted, the cp0 generates a warm reset exception (0xbfc00000). 7.3 scbus interface this section describes the scbus interface signals, which interface with the biu. scaoen address output enable output when asserted, scaoen indicates that the address output bus scaop[31:0] lines are valid. the CW4011 asserts this signal when the biu is performing an scbus transaction, and the signal remains active throughout the operation. scaoen also enables sctbstn, sctben, and sctpwn. scaop[31:0] address output bus output scaop[31:0] is the address output bus for instruction fetch and data read/write operations. the scaop[31:0] bus is valid only when the address output enable signal (scaoen) is asserted. it remains valid throughout the operation until scbrdyn, scbrtyn, or scberrn is asserted. scb32n 32-bit bus width sizing input when asserted, scb32n indicates that the external bus slave on the scbus needs 32-bit bus sizing. the CW4011 samples this signal on the rising edge of the clock that synchronizes scbrdyn. if the signal is
scbus interface 7-9 asserted for a 64-bit transaction, which is a doubleword or a part of a burst transaction, the biu generates a sub- sequent 32-bit word transaction. the biu also packs data to 64 bits for a read transaction or unpacks data to 32 bits for a write transaction. scberrn bus error input scberrn is asserted to terminate the current transac- tion when a bus error occurs. if scbrdyn or scbrtyn is asserted at the same time as scberrn, scberrn has higher priority. asserting scberrn causes the cp0 to generate an exception. scbgep bus grant enable output scbgep re?ects the value of the beg bit in the ccc register. when scbgep is low, the CW4011 is not accepting bus hold requests; when high, the core is accepting requests. scbpwan bus in-page write accept input scbpwan indicates that the external bus slave on the scbus will accept in-page write transactions. external logic asserts scbpwan and the core samples it on the rising edge of the clock that synchronizes scbrdyn. if the sctpwn signal is not asserted, asserting or deas- serting scbpwan has no effect. scbrdyn bus ready input the system asserts scbrdyn for one cycle when the current transaction is successfully terminated. asserting scbrdyn indicates that the scbus is available for another transaction. scbrtyn bus retry input an scbus slave module (usually an i/o dram controller) asserts this signal for one cycle to abort the current trans- action before it is complete. asserting scbrtyn also indicates to the core that the unsuccessful transaction must be retried later. the control state goes back to the idle state, then all bus requests are arbitrated again. if there are no other higher priority requests and sctsen is asserted, there is one idle state between the ?rst trans- action and a retry transaction. if scbrdyn and scbrtyn are asserted at the same time, scbrtyn has the higher priority.
7-10 CW4011 signals scdip[63:0] data input bus input scdip[63:0] are data bus input signals for instruction fetch and data read transactions. the CW4011 samples scdip[63:0] on the rising edge of the clock when scbrdyn is asserted. byte ordering is little endian. if you are designing a big-endian system, the higher order bits, scdip[31:0], must be swapped with the lower order bits, scdip[63:32], outside of the core. scdoen data output enable output scdoen indicates that the data output signals scdop[63:0] are valid. the CW4011 asserts scdoen throughout the write transaction to indicate that the current transaction is a write transaction and to enable data output. scdop[63:0] data output bus output scdop[63:0] are the data output bus signals for data write operations and for data writeback from the d-cache. the signals are valid throughout the write trans- action. byte ordering is little endian. if you are designing a big-endian system, the higher order bits, scdop[31:0], must be swapped with the lower order bits, scdop[63:32], outside of the core. schgtn bus hold grant output the biu enters the hold state and asserts schgtn to indicate that it is releasing scbus ownership because of a bus hold request (schrqn). schrqn bus hold request input a low value on schrqn indicates that an external bus master is requesting ownership of the scbus. the bus hold request has the highest priority during bus arbitra- tion. a bus hold request cannot break continuous trans- actions of in-page writes and burst read/write transactions if those transactions are supported by an asserted sctsen signal, but must wait until sctsen is deasserted. scifetn instruction fetch output scifetn indicates that the biu is fetching instruction data. while the biu is fetching, the core drives scifetn low and outputs it to external logic.
scbus interface 7-11 sclockn bus lock output the core asserts sclockn to indicate that it wishes to lock the scbus and restrict bus ownership. the core asserts this signal when an executed loadlink instruc- tion starts a read transaction in an uncached area or a writethrough cached area. the core executes a store- conditional instruction just before a write transaction starts, which deasserts sclockn. during the read and write transactions, the core asserts sclockn continu- ously, preventing bus ownership from changing during one of these transactions. an incorrect condition can exist if a storeconditional transaction hits the d-cache in a writeback cached area while sclockn is asserted. in this case, the core deas- serts sclockn without completing any bus transactions. sctben[7:0] byte enables output sctben[7:0] indicates which byte positions are valid for a transaction. the core asserts only one of the signals for a byte read or a byte write transaction. it asserts all the signals for a doubleword or a burst transaction. the sctben[7:0] signals are valid when the CW4011 asserts scaoen. sctbln burst last doubleword output the core deasserts sctbln while the ?rst, second, and third doubleword of a burst transaction is being read or written. otherwise, the core asserts sctbln, which is valid on the rising edge of the system clock. sctben signal valid byte positions 0 scdop[7:0] 1 scdop[15:8] 2 scdop[23:16] 3 scdop[31:24] 4 scdop[39:32] 5 scdop[47:40] 6 scdop[55:48] 7 scdop[63:56]
7-12 CW4011 signals sctbstn burst transaction output the core asserts sctbstn when the ?rst doubleword of a four doubleword transaction is being moved. the core deasserts this signal after the ?rst doubleword has trans- ferred, or for a singleword transaction. sctpwn next transaction is in-page write output the core asserts sctpwn to indicate that the next trans- action will be in the same dram page, as de?ned in the ccc register. the lsu write buffer checks to see if the subsequent write request is in the same page. when the core asserts this signal, a maximum of four sequential write transactions can occur, even if an instruction fetch request or data read request is pending. if all four write transactions are performed, the core asserts sctpwn for the ?rst three transactions and deasserts it for the fourth transaction. the core asserts sctpwn from the beginning of one in-page write trans- action to the end of that transaction. sctrqn transaction request output the core asserts sctrqn when it needs to generate a transaction regardless of the bus hold condition. the core can use this signal to deassert a bus hold request when it needs the scbus for an instruction fetch, data read, or data write transaction. sctsen transaction start enable input sctsen enables or disables a new scbus transaction. transaction requests are arbitrated only when sctsen is asserted. if an idle cycle is desired between two transac- tions, then this signal must be deasserted and then asserted while scbrdyn is asserted. during the time sctsen is deasserted, the biu repeats the idle state. sctssn transaction start strobe output the core asserts sctssn for one clock cycle at the beginning of a transaction to indicate that a new transac- tion has begun. if the transaction lasts through one cycle and the next transaction begins immediately, the core asserts sctssn continuously.
ocabus interface 7-13 7.4 ocabus interface the CW4011 has an on-chip access (oca) interface that allows on-chip modules to be accessed at the cr stage of the pipeline without involving the scbus. this improves performance and reduces latency by reducing traf?c on the scbus. the core is the only bus master for the ocabus, and instructions cannot be fetched through the ocabus. if the module that is the target of the transaction can respond in one clock cycle, there is no penalty for a read or write transaction. a read access on the scbus has at least a four-clock penalty, and a write access is staged through a four-deep write buffer. please note that the ocabus interface and the coprocessor interface share the following signals: cpfrcdp[31:0] cptocdp[31:0] cpfrcen cptocen cpsreqn[3:1] pstalln see section 7.5, coprocessor interface, for more information on coprocessor signals. the remainder of this section describes the oca interface signals in detail. accsizep[1:0] ocabus transaction size output these signals indicate the transaction size of an ocabus transaction. these signals are valid at the ex stage of the pipeline, when the core asserts either exloadp or accstorep. accsizep[1:0] transaction size 00 one byte 01 halfword 10 tribyte 11 one word
7-14 CW4011 signals note that data alignment for byte, halfword, or tribyte operations must be performed by either customer logic or software. accstorep ocabus ex stage store operation output the core asserts this signal when a store instruction is being executed in the ex stage of the pipeline. the core asserts accstorep to indicate that dvaddrp[31:0] and accsizep[1:0] are valid. dvaddrp[31:0] is decoded when the core asserts accstorep. if the resulting address is for a device on the ocabus, ocacceptp is asserted. cpfrcdp[31:0] data from oca input this bus inputs data from an oca device to a core general-purpose register. cpfrcdp[31:0] is valid at the cr stage of the pipeline when the core asserts the data enable signal (cpfrcen). if there are several oca devices, the data must be externally multiplexed. cpfrcen data from oca enable output the core asserts this signal at the cr stage of the pipe- line when data on the input bus (cpfrcdp[31:0]) is valid. if the pipeline enters a stall condition when there is a oca data movement instruction in the cr stage, the core asserts cptocen continuously until the stall condi- tion is resolved. cpsreqn3 oca stall request input an oca device can assert this signal input at the cr stage of the pipeline when it needs to request a pipeline stall. the oca shares cpsreqn3 with coprocessor 3, if coprocessor 3 is installed. the core asserts pstalln immediately when cpsreqn3 is asserted. cptocdp[31:0] data to oca output this bus outputs data to an oca device from a general- purpose register in the core. the cptocdp[31:0] signals are valid at the cr stage of the pipeline when the core asserts the data enable signal (cptocen).
ocabus interface 7-15 cptocen data to oca enable output the core asserts this signal at the cr stage of the pipe- line to indicate when the data output bus (cptocdp[31:0]) is valid. if the pipeline enters a stall condition when there is an oca data movement instruction in the cr stage, the core asserts cptocen continuously until the stall condition is resolved. crvalidp ocabus cr stage valid output the core asserts this signal when the cr stage of a load or store instruction is valid after it has asserted exloadp or accstorep. if the load or store instruction is can- celled, the core deasserts crvalidp and the load/store operation must be cancelled. dvaddrp[31:0] ocabus virtual address output this is the output bus for the ocabus virtual address. the bus holds either the source address of a load instruction in the ex stage, or the destination address of a store instruction in the ex stage. this bus is valid only during the ex stage of the pipeline when the core asserts either exloadp or accstorep. exloadp ocabus ex state load operation output the core asserts this signal when a load instruction is being executed in the ex stage of the pipeline. it asserts exloadp at the ex stage of the pipeline to indicate that dvaddrp[31:0] and accsizep[1:0] are valid. dvaddrp[31:0] is decoded when the core asserts exloadp. if the resulting address is for a device on the ocabus, ocacceptp is asserted. ocacceptp ocabus transaction accepted input the oca module asserts this signal when it is ready to accept an oca transaction. ocacceptp is an output from the dvaddrp address decoder and it is asserted at the cr pipeline stage. when ocacceptp is asserted for a read operation, the lsu selects cpfrcdp[31:0] as the data input. when ocacceptp is asserted for a write oper- ation, the data cptocdp[31:0] sent to the oca device is valid during the cr stage. the data is therefore not writ- ten into the d-cache and an scbus transaction is not requested.
7-16 CW4011 signals pstalln pipeline stall broadcasting signal output the core asserts this signal to indicate that all pipeline stages are stalled. this signal is valid during any stage of the pipeline. 7.5 coprocessor interface this section describes the coprocessor interface signals that interface with the isu and lsu. contact lsi logic if your design requires additional coprocessors (other than cp0). brlikfn branch likely if even slot is false output the core asserts brlikfn when a branch likely instruc- tion is in an even slot and the branch is not taken. if, at this time, a coprocessor has a valid instruction in the ex stage, the instruction must be cancelled. it is not neces- sary to check whether the instruction in the ex stage is in an even or odd slot, since the core asserts brlikfn only when the branch likely instruction is in the even slot. if the branch likely instruction in the even slot is not taken, the instruction in the odd slot must be nulli?ed, even if it has started. cpbusyn[3:1] coprocessor busy input these inputs are asserted when an external coprocessor is busy and cannot accept a coprocessor operation. the isu does not assert the execution strobe signals, cpxstbn[3:1], when the related cpbusyn signal is asserted, and the core stalls until the busy signal is deas- serted. each coprocessor is independent and asserts its busy signal from the ex stage. the core examines the cpbusyn signal at the rd stage of the pipeline, on the rising edge of the system clock. cpbusyn signal busy coprocessor 1 cp1 2 cp2 3 cp3
coprocessor interface 7-17 cpcodep[31:0] cp instruction code bus output this bus outputs the entire instruction bit ?eld at the rd stage. it is valid when the core asserts one of the cpxstbn[3:1] lines. although the core can execute two instructions per cycle, only one coprocessor instruction can be issued in one cycle. cpcodep[31:0] are the selected outputs of the even and odd instruction slots. external logic must sample the bus on the rising edge of the system clock when the core asserts the strobe signal (cpxstbn). it is not necessary to decode all bits of an instruction, because the execution strobe signal is a partial decoding signal. cpcondp[3:0] coprocessor condition input these inputs are used for the coprocessor conditional branch instruction. the core samples the inputs in the isu at the ex stage of a conditional branch instruction. the four cpcondp inputs are associated with the four possible coprocessors (cp0Ccp3). since cp0 does not need a conditional input, cpcondp0 is used as a general-purpose condition input. cpfixupn data fixup cycle strobe for lwcz cache miss output the core asserts cpfixupn when correct data is output on cptocdp[31:0] during a ?x-up cycle. it asserts the signal during stall cycles because lwcz cache misses cause the pipeline to stall until the data is read. cpfrcdp[31:0] data from coprocessor input this bus inputs data from a coprocessor register to a general-purpose core register or to memory. data on the bus is valid when core asserts the data enable signal cpcondp signal coprocessor condition 0 cp0 1 cp1 2 cp2 3 cp3
7-18 CW4011 signals (cpfrcen). the core samples cpfrcdp[31:0] at the cr stage of the pipeline. if there are several external coprocessors, the data bus must be multiplexed outside the CW4011. cpfrcen data from coprocessor enable output the core asserts this signal to enable the data input bus cpfrcdp[31:0]. coprocessors can generate the same information from the instruction code (cpcodep) by tracking the pipeline stage. external logic must decode the coprocessor number from cpcodep[31:0]. if the pipeline enters a stall condition when there is a coproces- sor data movement instruction in the cr stage, the core asserts cptocen continuously until the stall condition is resolved. cpmissn data cache miss strobe for lwcz output the core asserts cpmissn at the cr stage of an lwcz instruction when a d-cache miss occurs. data at the cr stage is not correct and the correct data is put on cptocdp[31:0] during a later ?xup cycle. the core asserts pstalln from the wb stage of the lwcz instruction. cprstn[3:1] coprocessor reset output these outputs indicate the condition of cu[3:1] bits in the cp0 status register. if the cu bit is 0, the core asserts the corresponding cprstn[3:1] output. the core asserts the cprstn[3:1] signals when a cold reset is asserted. at this time, the cu bits are cleared. the cu bits are not cleared when a warm reset is asserted. the cprstn[3:1] outputs allow the system designer to use software resets for external coprocessors. cpsreqn[3:1] coprocessor stall request input the external coprocessors assert these signals when they need to request a pipeline stall. coprocessors can assert cpsreqn[3:1] while a previous coprocessor cprstn signal cu bit 1 cu[1] 2 cu[2] 3 cu[3]
coprocessor interface 7-19 instruction is being executed, after decoding a coproces- sor instruction, and after the rd stage. when one of the cpsreqn[3:1] signals is asserted, the core asserts pstalln. cptocdp[31:0] data to coprocessor output this bus outputs data to a coprocessor register from a general-purpose core register or from memory. data on this bus is valid at the cr stage of the pipeline when the core asserts the data enable signal (cptocen). cptocen data to coprocessor enable output the core asserts this signal to indicate when the data output bus, cptocdp[31:0], is valid at the cr stage of the pipeline. coprocessors can generate the same infor- mation from the instruction code (cpcodep) by tracking the pipeline stage. the coprocessor number must be decoded from cpcodep[31:0]. if the pipeline enters a stall condition when there is a coprocessor data move- ment instruction in the cr stage, the CW4011 asserts cptocen continuously until the stall condition is resolved. cpxoddn coprocessor instruction at odd slot output when the core asserts an execution strobe, it also asserts cpxoddn at the rd stage of the pipeline to indi- cate that the coprocessor instruction is in the odd slot. this information must be kept in the coprocessor pipeline until the cr stage. it is used to determine whether or not the instruction should be cancelled when the cancellation signal is asserted. cpxstbn[3:1] coprocessor instruction execution strobe output these strobe signals indicate the start of a coprocessor operation that involves data movement. the core asserts only one of the signals during a clock cycle. the cpxstbn[3:1] signals are partial decoding signals for an cpsreqn signal coprocessor requesting stall 1 cp1 2 cp2 3 cp3
7-20 CW4011 signals instruction. the isu also uses the signals to check for resource con?icts, including coprocessor busy signals. the cpxstbn[3:1] signals are valid at the rd stage of the pipeline. fpeoddn fpu error exception in odd slot input fpeoddn indicates whether the instruction that caused an fpu exception (fperrxn assertion) is in an even slot (fpeoddn is high) or odd slot (fpeoddn is low) when it started at the rd stage. the core ignores the fpeoddn signal when fperrxn is deasserted. when the instruction is started at an rd stage, the cpxoddn signal informs the coprocessor that the instruction is in an even or odd slot. to handle a pipeline cancel correctly, the coprocessor must keep the instruc- tion in its pipeline registers. to execute an fpu exception precisely, the coprocessor that asserts fperrxn at the ex stage must drive fpeoddn correctly according to the even/odd status of the ex pipeline stage. fperrxn floating point unit error exception input fperrxn is an exception input, used speci?cally with an fpu coprocessor. the core samples the signal at any time in the ex stage and issues a pipeline cancel signal at the cr stage, in the same way as exintn. in the cause register, exception code 15 is shown for the exception if it is the highest priority. fperrxn can be used as a user-de?ned coprocessor exception input. fperrxn must be treated precisely. the fpu asserts fperrxn at the ex stage of the instruction with the fpeoddn signal assertion/deassertion. the core asserts the pipeline cancel signal at the cr stage with the correct even/odd cancel signal. pcancrn pipeline cancel at cr stage output when one or more exceptions occurs, the pipeline is cancelled at the cr stage and the core asserts pcancrn. coprocessor pipelines must be cancelled to cpxstbn signal coprocessor 1 cp1 2 cp2 3 cp3
coprocessor interface 7-21 prevent a second execution of the coprocessor instruction under either one of the following conditions: when the coprocessor returns from an exception handler or when the coprocessor has ?nished executing an lwcz instruc- tion that caused a tlb miss. the wb stage is not can- celled when pcancrn is asserted. pcancrn is valid at the cr stage of the pipeline. pcanoddn pipeline cancel is for odd slot output pcanoddn is valid only when pcancrn is asserted. this signal informs coprocessors whether the cancella- tion is for an odd or even slot. when the core asserts the signal, cancellation applies to the odd slot. when it is deasserts the signal, cancellation applies to both even and odd slots. the coprocessor must track which slot it is executing in based on the cpxoddn signal. when the core asserts both pcancrn and pcanoddn and the coprocessor instruction is in the odd slot, the instruction must be cancelled. when the core asserts pcancrn and deas- serts pcanoddn, the coprocessor instruction must be cancelled regardless of which slot it is operating in. this signal is valid at the cr stage of the pipeline. pstalln pipeline stall signal output the core asserts this signal to indicate that the entire pipeline is stalled. coprocessor pipelines must be stalled if they are executing instructions. the core asserts pstalln for all pipeline stalls and for an lwcz instruc- tion d-cache miss. suspexn suspend ex stage output the isu asserts suspexn request coprocessors to sus- pend the instruction in the ex stage. the instruction in the ex stage must be held until the isu deasserts sus- pexn. instructions in the cr and wb stages must be completed.
7-22 CW4011 signals 7.6 cache invalidation interface this section describes the cache invalidation interface signals, which interface to the isu, lsu, and cp0. cinvap[31:5] cache invalidation address bus input the cinvap[31:5] input bus is the address input bus for d-cache and i-cache invalidation. when an external bus master writes data into the main memory, the address must be checked in the d-cache and the i-cache. if the address is cached, the line must be invalidated. the core samples this bus when either dcinvsn or icinvsn is asserted. dcinvsn d-cache invalidation strobe input when asserted, dcinvsn indicates the cache invalida- tion address bus is valid and that there is need for a d-cache snooping sequence. if the cache tag is not coincident with higher address bits, the line is not invalidated. icinvsn i-cache invalidation strobe input when asserted, icinvsn indicates the cache invalida- tion address bus is valid and that there is need for an i-cache snooping sequence. if the cache tag is not coin- cident with higher address bits, the line is not invalidated. 7.7 data cache interface these signals interface the CW4011 with the d-cache memory. if a design involves a one-way set associative cache, the signals for the second cache should be tied either low or high, whichever deasserts the signal. this is also true for a no-cache con?guration, where both signal sets need to be deasserted. 7.7.1 d-cache tag ram signals dctagaddrp[12:5] d-cache tag address output this bus carries the lower bits (bits [12:5]) of the virtual address for data load/store operations, and is the
data cache interface 7-23 address offset portion of the cache block. dctagad- drp[12:5] addresses the tag ram for read, write, or update operations, and connects to the tag rams of both d-cache set 0 and d-cache set 1. the following table lists the valid bits for different d-cache ram sizes. dctagdip[23:0] d-cache tag data in output for both d-cache set 0 and d-cache set 1, the core drives dctagdip[23:0] with the upper 22 bits of the physical address (bits [31:10]) concatenated with the valid and dirty bits. dctagdip[1] contains the valid bit; dctagdip[0] contains the dirty bit, which may be ignored when using a writethrough policy. dc0tagdop[23:0] d-cache set 0 tag data out input d-cache set 0 (in a two-way set associative cache) drives these signals with the appropriate tag contents. dc0tagwep[1:0] d-cache set 0 tag write enable output the core asserts dc0tagwep1 to signal the d-cache set 0 tag ram to write the cache entry tag and the valid bit (dc0tagdip[23:1]) into the location addressed by dctagaddrp[12:5]. both dctagaddrp[12:5] and dctagdip[23:1] are valid at this time. the core asserts dc0tagwep0 to signal the tag ram to write the value of dctagdip0 (dirty bit) into the location addressed by dc0tagaddrp[12:5]. dc1tagdop[23:0] d-cache set 1 tag data out input d-cache set 1 (in a two-way set associative cache) drives these signals with the appropriate tag contents. d-cache size (kbytes) dctagaddrp[12:5] valid bits 8 [12:5] 4 [11:5] 2 [10:5] 1 [9:5]
7-24 CW4011 signals dc1tagwep[1:0] d-cache set 1 tag write enable output when the core asserts dc1tagwep1, the d-cache set 1 tag ram writes the cache entry tag and the valid bit (dc0tagdip[23:1]) into the location addressed by dctagaddrp[12:5]. both dctagaddrp[12:5] and dctagdip[23:1] are valid at this time. when the core asserts dc1tagwep0, the tag ram writes the value of dctagdip0 (dirty bit) into the location addressed by dc0tagaddrp[12:5]. 7.7.2 d-cache data ram signals dcadatadip[31:0] d-cache data bank a ram data in output these signals output data to the d-cache data bank a ram data bus inputs. dcdataaddrp[12:3] and dcadataaddrp address the bank a data rams. dcadatadop[31:0] d-cache data bank a ram data out input these signals receive data from the d-cache data bank a ram. dcdataaddrp[12:3] and dcadataad- drp address the bank a data rams. the ram should continuously output the data for the given address. dcadatawep[3:0] d-cache data bank a write enable output these signals control the byte write enables to the d-cache data bank a ram. when the related dcadatawep bit is asserted, the ram must write that byte of dcadatadip[31:0] into memory. if all dcadatawep bits are asserted, the ram must write the entire word into memory. for example, if only dcadatawep[0] is asserted, the ?rst byte is written as normal. the other three bytes of
data cache interface 7-25 dcadatadip[31:0] should be ignored and the data ram should hold the previous three bytes of data. dcbdatadip[31:0] d-cache data bank b ram data in output these signals output core data to the d-cache data bank b ram data bus inputs. dcdataaddrp[12:3] and dcbdataaddrp address the data within the bank b data ram. dcbdatadop[31:0] d-cache data bank b ram data out input this data bus transfers signals from the d-cache set 1 data ram to the CW4011 core. the ram should contin- uously output the data for the given address. dcbdatawep[3:0] d-cache data bank b write enable output these signals control the byte write enable inputs to the d-cache data bank b ram. when the related dcbdatawep bit is asserted, the ram must write that byte of dcbdatadip[31:0] into memory. if all dcbdatawep bits are asserted, the ram must write the entire word into memory. for example, if only dcbdatawep[0] is asserted, the ?rst byte is written as normal. the other three bytes of dcbdatadip[31:0] should be ignored and the data ram should hold the previous three bytes of data. dcadatawep dcadatadip byte bits 0 first [7:0] 1 second [15:8] 2 third [23:16] 3 fourth [31:24] dcbdatawep dcbdatadip byte bits 0 first [7:0] 1 second [15:8] 2 third [23:16] 3 fourth [31:24]
7-26 CW4011 signals dcdataaddrp[12:3] d-cache data address output this bus holds the upper 10 bits used to address the memory location within the d-cache data rams. both data rams are addressed using an 11-bit address. dcdataaddrp[12:3] connects to bits [11:1] of the ram address bus. dcadataaddrp connects to the least- signi?cant bit (bit 0) of the data ram for d-cache bank a and dcbdataaddrp connects to d-cache bank b bit 0. dcdataaddrp[12:3] connects to the data ram address lines of both d-cache bank a and b. the dcdataaddrp[12:3] signals are valid along with dcadataaddrp, dcbdataaddrp, dcadatawep, dcbdataaddrp, and the write enable signals. note that the data ram address bus is up to 11 bits and the ram is 32 bits wide. dcadataaddrp d-cache data bank a address lsb output this signal connects to the least-signi?cant bit of the d-cache data bank a ram address bus (bit 0 of the address input). the concatenation of dcdataaddrp[12:3] and dcadataaddrp selects which word is brought into the core from the cache. dcbdataaddrp d-cache data bank b address lsb output this signal connects to the least-signi?cant bit of the address bus of the d-cache data bank b ram (bit 0 of the address input). the concatenation of dcdataaddrp[12:3] and dcbdataaddrp selects which word is brought into the core from the cache. 7.8 instruction cache interface the signals described in this section connect the CW4011 with the i-cache memory. the descriptions assume that the system is using a two-way set associative cache referred to as i-cache set 0 and i-cache set 1. the correct cache con?guration must be set in the ccc register, and any buses or signals not needed must be deasserted by tying them either high or low.
instruction cache interface 7-27 the i-cache tag comparators are built into the core, so external tag comparators are needed only for the d-cache. 7.8.1 i-cache tag ram signals itfenstp tag fetch enable strobe output the core asserts itfenstp to signal that it is performing a store or a read operation to one of the i-cache tag rams. ictagdip[22:0] new tag to instruction tag ram output this bus carries the tag data and a valid bit. data on this bus should be written into the location speci?ed by itaddrp[7:0]; i-cache set 0 if ic0tagwep is asserted, and i-cache set 1 if ic1tagwep is asserted. ictagrdp tag read output ctagrdp is the read enable signal for both i-cache sets. the core asserts this signal and itfenstp to inform the i-cache tag rams that they should place the data selected by itaddrp[7:0] on the ic0tagdop[22:0] and ic1tagdop[22:0] buses. ic0tagdop[22:0] i-cache tag data out set 0 input the i-cache set 0 ram outputs the tag data onto this input bus. when both the itfenstp and icdatardp signals are asserted, the tag ram outputs the contents of the location pointed to by itaddrp[7:0] onto the ic0tagdop[22:0] bus. ic0tagwep tag write set 0 output if itfenstp is asserted and the core asserts ic0tagwep, this informs the i-cache set 0 tag ram to write the data from ictagdip into the memory location speci?ed by itaddrp[7:0]. both ictagdip and itad- drp[7:0] are valid during this write transaction. ic1tagdop[22:0] i-cache tag data out set 1 input the i-cache set 1 ram outputs tag data to this input bus. when both itfenstp and icdatardp are asserted, the tag ram should output the contents of the location
7-28 CW4011 signals pointed to by itaddrp[7:0] onto the ic1tagdop[22:0] bus. ic1tagwep tag write set 1 output if itfenstp is asserted and the core asserts ic1tagwep, this informs the i-cache set 1 tag ram to write the data from ictagdip into the memory location speci?ed by itaddrp[7:0]. both ictagdip and itad- drp[7:0] are valid during this write transaction. itaddrp[7:0] tag address output the core drives these signals with the eight lower bits of the virtual address. itaddrp[7:0] is used to address the i-cache tag ram. if itfenstp and ictagrdp are asserted, i-cache set 0 should output data selected by this address onto ic0tagdop[22:0], and i-cache set 1 should output data addressed by this bus onto ic1tagdop[22:0]. itaddrp[7:0] is also used to address i-cache set 0 and i-cache set 1 tag rams for write operations. 7.8.2 i-cache ram signals icaddrp[9:0] i-cache address output the core drives these signals with the addresses for read/write operations to the i-cache data rams. icdatadip[63:0] data to i-cache output these signals hold instructions to be written into the i-cache data ram during a write operation. icdatardp i-cache read indicator output the core asserts this signal to indicate that the current operation to the i-cache data ram is a read. if icdatardp and icfenstp are asserted, or maincycp is asserted, instructions from the instruction rams of both i-cache set 0 and i-cache set 1 should be placed on the ic0datadop[63:0] and ic1datadop[63:0] buses. icfenstp cache fetch enable strobe output the core asserts this signal to inform the instruction ram that a read or write operation is occurring. the ram or glue logic should check and perform the read/write oper-
instruction cache interface 7-29 ation. data from the core is valid on the signals and buses while icfenstp is asserted. ic0datadop[63:0] i-cache data in set 0 input the core reads instructions from i-cache set 0 on this bus. the instruction data ram and glue logic must pro- vide a valid instruction before the next cycle, or an error may result. ic0datawehp upper word i-cache write enable set 0 output the core asserts ic0datawehp to enable the i-cache set 0 instruction ram to write the value on the upper 32 bits of icdatadip[63:0] to the location selected by icaddrp[9:0]. if ic0datawehp is not asserted, the value of the higher 32 bits (bits [61:32]) of the location selected by icaddrp[9:0] of i-cache set 0 must remain unchanged. the write transaction occurs only if icfenstp or maincycp is asserted at the same time as ic0datawehp. ic0datawelp lower word i-cache write enable set 0 output the core asserts ic0datawelp to enable the i-cache set 0 instruction ram to write the value on the lower 32 bits of icdatadip[63:0] to the location selected by icaddrp[9:0]. if ic0datawelp is not asserted, the value of the lower 32 bits (bits [31:0]) of the location addressed by icaddrp[9:0] of i-cache set 0 must remain unchanged. the write occurs only if icfenstp or maincycp is asserted at the same time. ic1datadop[63:0] i-cache data in set 1 input the core reads instructions from i-cache set 1 on this bus. the instruction ram and glue logic must provide a valid instruction before the next cycle. ic1datawehp upper word i-cache write enable set 1 output the core asserts ic1datawelp to enable the i-cache set 1 instruction ram to write the value on the lower 32 bits of icdatadip[63:0] to the location selected by icaddrp[9:0]. if ic1datawehp is not asserted, the
7-30 CW4011 signals value of the higher 32 bits (bits [61:32]) of the location addressed by icaddrp[9:0] must remain unchanged. the write occurs only if icfenstp or maincycp is asserted at the same time as ic1datawehp. ic1datawelp lower word i-cache write enable set 1 output the core asserts ic1datawelp to enable the i-cache set 1 instruction ram to write the value on the higher 32 bits of icdatadip[63:0] to the location selected by icaddrp[9:0]. if ic1datawelp is not asserted, the value of the lower 32 bits (bits [31:0]) of the location addressed by icaddrp[9:0] must remain unchanged. the write occurs only if icfenstp or maincycp is asserted at the same time as ic1datawelp. maincycp i-cache data maintenance mode output the core asserts this signal to inform the instruction ram that the processor is operating in isolate cache mainte- nance mode. 7.8.3 i-cache least recently used (lru) ram signals iclrurdp i-cache lru read data input the core uses this signal to read data held in lru mem- ory during an lru ram access. the lru ram should drive the value addressed by lruaddrp[7:0] onto this input, if both iclrurep and lrufenstp are asserted. iclrurep read strobe to lru ram output the core asserts this signal to indicate that the current lru ram operation is a read. if both iclrurep and lrufenstp are asserted, the core reads data selected by address lruaddrp[7:0] on iclrurdp. iclruwdp lru write data output the core drives this signal with the data that must be written into the lru ram in a store operation. the core drives the store address on lruaddrp[7:0]. the core asserts iclruwep and lrufenstp to indicate a store operation. iclruwep write strobe to lru ram output the core asserts this signal to indicate that the current lru ram operation is a write. if both iclruwep and
writeback buffer interface 7-31 lrufenstp are asserted, data should be written from iclruwdp into the location selected by address lruaddrp[7:0]. lruaddrp[7:0] lru address output the core drives these signals with the address for read/write operations to the lru ram. lrufenstp lru fetch enable strobe output the core asserts lrufenstp to indicate that a load or store bit operation to the lru ram is occurring. depend- ing on the read (iclrurep) or write (iclruwep) enable, the address bus, lruaddrp[7:0], addresses the memory location for a read or write transaction. 7.9 writeback buffer interface the CW4011 core provides a simple interface to attach a doubleword writeback buffer. to add a writeback buffer to the design, a special ram needs to be added with glue logic that controls the i/o between the core and the writeback buffer. this ram should consist of four sets of 64-bit registers and control logic. the following signals interface the core with the writeback buffer. realwbp real writeback buffer installed input asserting this signal informs the core that a fully func- tional writeback buffer is installed. wbbrap[1:0] writeback buffer read address output these signals inform the writeback buffer from which buffer slot (out of the four available) it should return data. the writeback buffer returns data on wbbrdp[63:0]. wbbrap[1:0] writeback buffer slot 00 0 01 1 10 2 11 3
7-32 CW4011 signals wbbrdp[63:0] writeback buffer read data input the writeback buffer drives these signals with the data held in the buffer slot indicated by wbbrap[1:0]. wbbselabp writeback buffer order output the core asserts this signal to inform the writeback buffer to ?ip the word order during a write transaction. the default is to write the words in xy order, with x being the most signi?cant word, and y being the least-signi?cant word. asserting wbbselabp reverses the word order to yx. wbbwap[1:0] writeback buffer write address output these signals inform the writeback buffer in which dou- bleword slot (out of the four available) it should store data. the writeback buffer should write the data into a slot only if wbbwep is asserted. write data arrives from d-cache set 0 through dcadatadop and from d-cache set 1 through dcbdatadop. wbbwep writeback buffer write enable output the core asserts this signal to inform the writeback buffer that it should write data into the buffer slot indi- cated by wbbwap[1:0] at the next sclkp edge. the word write order is determined by the wbbselabp sig- nal and the slot written to is determined by wbbwap[1:0]. 7.10 memory management unit (mmu) interface the CW4011 core offers a set of signals to interface with either a user-designed mmu or an lsi coreware building block. the core has a built-in coprocessor 0 to handle virtual memory addressing and exception handling, but the user must provide a tlb and a real mmu to implement a fully functional mips 3000/4000 microprocessor. please wbbwap[1:0] writeback buffer slot 00 0 01 1 10 2 11 3
memory management unit (mmu) interface 7-33 note that exceptions may be generated anytime in the if, q, ex, or cr pipeline stages, but are always serviced in the cr stage. accstorep data access is a store request output the core asserts this signal when a store instruction is being executed in the ex stage of the pipeline. dvaddrp[31:0], dvaddrvp, and accsizep[1:0] are valid at the same time. if accstorep is deasserted, the data access is a fetch operation. badvpnp[31:12] failing virtual address for tlb exceptions output these signals output the virtual page number stored in the entryhi and context registers that caused a tlb exception. badvpnp[31:12] is the upper 20 bits of the virtual address that caused the exception. cachewbp data access writeback mode input the mmu asserts this signal to inform the core that the associated cr stage store transaction should be completed as writeback instead of writethrough. the following table lists typical operation: cfgdsp[1:0] con?guration d-cache set size output these signals output information from the ccc register indicating the d-cache set size (1, 2, 4, or 8 kbytes). if the ccc register indicates that no cache is installed, this bus is unde?ned. ccc register memory segment mode te wb x0 kseg0 writethrough x1 kseg0 writeback 00 kuseg , kseg2 writethrough 01 kuseg , kseg2 writeback 1x kuseg , kseg2 from tlb entry cfgdsp[1:0] d-cache size 00 1 01 2 10 4 11 8
7-34 CW4011 signals cfgiscp con?guration isolate cache mode output this signal re?ects the isolate cache mode bit value from the ccc register. the core asserts cfgiscp to inform the mmu that updates to the memory should not extend past the primary data cache and addresses are not to be translated. when the mmu receives this information, it disables the tlb. cfgtep con?guration tlb enable output cfgtep is the tlb enable bit from the ccc register. the core asserts this signal to enable the tlb, if one is installed. cfgwbp con?guration cache writeback mode output this signal re?ects the writeback mode bit value from the ccc register. the core asserts cfgwbp to indicate that store operations are performed with the writeback policy. cpsreqn[3:1] coprocessor stall request input the external coprocessors assert these signals when they need to request a pipeline stall. coprocessors can assert cpsreqn[3:1] while a previous coprocessor instruction is being executed, after decoding a coproces- sor instruction, and after the rd stage. when one of the cpsreqn[3:1] signals is asserted, the core asserts the pstalln signal. cpzexensp cr stage strobe enable output the core asserts cpzexensp to indicate that the ex stage pipeline clock is enabled. cpzqensp q stage strobe enable output the core asserts cpzqensp to indicate that the q stage pipeline clock is enabled. cpzrdensp rd stage strobe enable output the core asserts cpzrdensp to indicate that the rd stage pipeline clock is enabled. cpzstallp stall request from cp0 output cpzstallp indicates that internal pipeline stages have entered a stall condition by executing a waiti (wait inter- rupt) instruction. the core asserts cpzstallp when the instruction is at the wb stage of the pipeline, and the
memory management unit (mmu) interface 7-35 signal remains active until the core receives an external exception (enabled external interrupt, nmin, cold reset, or warm reset). cresetn cold system reset input asserting this signal asynchronously resets the mmu by initializing all internal states. cresetn must be deas- serted synchronously on the rising edge of sclkp. dattlbxp tlb exception is for data access input the mmu asserts this signal during the cr stage to inform the core that a data load/store operation has caused a tlb exception. the tlbmissp and tlbmodp signals must be checked to determine the cause of the exception: dc0tagdop[23:0] d-cache set 0 tag data out input the d-cache set 0 (in a two-way set associative cache) drives these signals with the appropriate tag contents. dc1tagdop[23:0] d-cache set 1 tag data out input the d-cache set 1 (in a two-way set associative cache) drives these signals with the appropriate tag contents. dpaddrp[31:12] data access physical address input the mmu drives these signals with the tlb-translated physical address. if the tlb is disabled, then the mmu/tlb passes the core the original data virtual address. duncachep data access request is uncached input the mmu asserts this signal when it detects accesses to memory segment kseg1 , which is unmapped and uncached, based on address bits dvaddrp[31:28]. asserting this signal informs the core d-cache controller tlbmissp tlbmodp exception cause 0 0 tlb invalid 0 1 tlb mod 1 0 tlb miss 1 1 illegal
7-36 CW4011 signals that it should not cache the read/write operations to the address. dvaddrp[31:0] data virtual address output the core uses these signals to output the virtual address of a load or store instruction being executed in the ex stage. the address is valid when dvaddrvp is asserted. dvaddrvp data access request valid output the core asserts this signal to inform the mmu that dvaddrp[31:0] is valid, and that the mmu should perform an address translation. ifetlbxp tlb exception is for instruction fetch input the mmu asserts this signal during the cr stage if a tlb exception occurs as a result of an instruction fetch. the tlbmissp and tlbmodp signals must be checked to determine the cause of the exception: ifnseqp ifetch access is the target of a branch/jump output the core asserts this signal to indicate that the associ- ated instruction fetch (i-fetch) in the if stage of the pipeline is not word-sequential. when asserted, ifenseqp indicates that the previous instruction, now in the q or rd stage, was a branch/jump instruction. ipaddrp[31:12] ifetch physical address input the mmu drives these signals with the tlb-translated instruction fetch physical address. if the tlb is disabled, then the mmu drives back the original fetch address untranslated. isustallp stall request from isu output isustallp is the stall request signal from the isu. the core asserts this signal to inform the mmu that it should halt operations. tlbmissp tlbmodp exception cause 0 0 tlb invalid 0 1 tlb mod 1 0 tlb miss 1 1 illegal
memory management unit (mmu) interface 7-37 iuncachep instruction fetch data is uncached input the mmu asserts this signal when it detects unmapped accesses to kernel segment kseg1 (uncached memory), or when tlb cache information so indicates for kuseg and kseg2 . asserting iuncachep informs the core i-cache controller that the present ifetch is uncached. ivaddrp[31:0] ifetch virtual address output these signals hold the virtual address of the instruction which is to be translated into a physical address. ivaddrvp ifetch request valid output the core asserts this signal to inform the mmu that ivaddrp[31:0] holds a valid address, and that the mmu should perform an instruction fetch address translation. lsustallp stall request from lsu output lsustallp is the stall request signal from the lsu. the core asserts this signal to inform the mmu that it should halt operations. mmudataip[31:0] mmu register data input bus output this bus transfers data from the core to the mmu (or cp0). mmudataip[31:0] transfers data to the core registers based on mfc0 instructions. mmudataop[31:0] mmu register data output bus input this bus transfers data from the mmu (or cp0) to the core. mmudataop[31:0] transfers data to the mmu registers based on mfc0 instructions. mmuenwrp mmu register write enable output the core asserts this signal to inform the mmu that it should write the data from cptocdp into the mmu register selected by mmuregsp[3:0]. the mmu should latch the data into the mmu register at the beginning of the wb stage. mmuregsp[3:0] mmu data register select (read/write) output the mmu decodes mmuregsp[3:0] to determine the target of a mmu register address read/write operation.
7-38 CW4011 signals if the operation is a write into the mmu registers, the mmu should latch the data into the mmu registers at the beginning of the wb stage. if the operation is a read, the mmu should provide valid data during the cr stage. this signal is valid in the cr pipeline stage. mmustallp mmu through cp0 stall input the mmu asserts this signal to cause the core to stall while the mmu is servicing an itlb translation miss. pcancelp pipeline cancel signal from cp0 output the core asserts this signal to inform the mmu that it should clear exception registers and pipeline information. the core asserts pcancelp when an instruction that generates an exception enters the cr pipeline stage. realmmup real mmu installed indication input the mmu asserts this signal to inform the core that a fully functional mmu with a tlb is installed. selqnsp select q stage (no swap) output the core asserts this signal to indicate that data from the q stage must be fed into the rd stage of the pipeline. the mmu selects the ifetch stage data if selqnsp is deasserted, or the q stage data if selqnsp is asserted. when selqnsp is asserted, it also informs the core that signals from the q stage are valid. stpeftchp stop external fetch signal to isu input the mmu asserts this signal to stop the isu from making fetch requests to external memory. assertion of this signal informs the isu that the translated physical address is invalid. mmuregsp[3:0] mmu register selected 0000 index 0001 random 0010 entrylo 0100 context 0101 pagemask 0110 wired 1010 entryhi
memory management unit (mmu) interface 7-39 suspexn suspend ex stage output the isu asserts suspexn to request that the mmu sus- pend the instruction in the ex stage. the instruction in the ex stage must be held until the isu deasserts suspexn. instructions in the cr and wb stages must be completed. tagmtch0p d-cache set 0 tag match input the mmu asserts this signal to inform the core that a memory access has hit in the cache because the tag information matches for d-cache set 0. a custom mmu should include comparators that perform this function. the tag data from dc0tagdop[23:2] should be taken from the data out of the tag rams and compared to the translated physical address, dpaddrp[31:12] and dvaddrp[11:10]. note that the size of the comparators depends on the size of the tags. see chapter 6 , CW4011 caches , for further information. tagmtch1p d-cache set 1 tag match input the mmu asserts this signal to inform the core that a memory access has hit in the cache because the tag information matches for d-cache set 1. a custom mmu should include comparators that perform this function. the tag data from dc1tagdop[23:2] should be taken from the data out of the tag rams and compared to the translated physical address, dpaddrp[31:12] and dvaddrp[11:10]. note that the size of the comparators depends on the size of the tags. see chapter 6 , CW4011 caches , for further information. testmp test mode enable input testmp is used for scan chain testing. it is a static input and must be tied low during normal operation and tied high for scan chain testing. tlbmissp tlb miss exception input the mmu asserts tlbmissp to inform the core that the mmu exception was due to a tlb miss. tlbmodp tlb modi?ed exception input the mmu asserts tlbmodp to inform the core that the mmu exception is because of a store to a page which is not marked dirty or writable.
7-40 CW4011 signals tlbpp tlb probe request output the cp0 asserts this signal to probe the tlb (when the probe tlb for matching entry instruction is valid in the ex stage). the tlb places probe results in the index register. tlbrp tlb read request output the cp0 asserts this signal to request a read transaction from the tlb (when the read indexed tlb entry instruc- tion is valid in the ex stage). the tlb places the data in the entryhi, entrylo, and pagemask registers. tlbwip tlb write index request output the cp0 asserts this signal to request an indexed write transaction to the tlb (when the write indexed tlb entry instruction is valid in the ex stage). data from the pagemask, entryhi, and entrylo registers is written into the tlb entry de?ned by the index register. tlbwrp tlb write random request output the cp0 asserts this signal to request a random write transaction to the tlb (when the write random tlb entry instruction is valid in the ex stage). data from the pagemask, entryhi, and entrylo registers is written into the tlb entry de?ned by the random register. vldtlbxp valid tlb exception in cr stage output the core asserts this signal to indicate that a tlb excep- tion occurred and is being re?ected to the isu. excep- tions are handled in the cr pipeline stage. the mmu must then load the entryhi and context registers with the failing virtual page number. wresetn warm system reset input to perform a warm reset, wresetn must be asserted and then deasserted synchronously on the rising edge of the sclkp. while asserted, mmu internal states are ini- tialized; when deasserted, the cp0 generates a warm reset exception.
mmu to shell interface 7-41 7.11 mmu to shell interface these signals interface the mmu unit to external logic, but do not directly interface with the CW4011 core. frcmn force cache miss mmu input asserting frcmn forces a cache miss for both the i-cache and d-cache. the core treats the transaction as an access to an uncached area. frcmn is useful for debugging the system. this is a static input, and is tied low for software debugging. 7.12 multiply/divide unit (mdu) interface the following signals interface the CW4011 core with a high-performance mdu. the mdu may be either an lsi logic building block, or a user- designed mdu. brlikfn branch likely if even slot is false output the core asserts brlikfn when a branch likely instruc- tion is in an even slot and the branch is not taken. when brlikfn is asserted, the instruction in the odd slot must be nulli?ed, even if it has started. hilobusyp hi/lo register busy signal input the mdu asserts this signal to inform the core that either the hi or lo register of the mdu is busy. mtlo/hi instructions and instructions involving the hi/lo register cannot be performed in the present cycle. more speci?- cally, the mdu asserts hilobusyp to prevent these instructions from entering the ex stage. inste[31:0] even instruction code output the core drives these signals with the instruction code for the instruction in the rd stage of the even pipeline. the mdu must decode the instruction to see if it is intended for the mdu. insto[31:0] odd instruction code output the core drives these signals with the instruction code for the instruction in the rd stage of the odd pipeline. the
7-42 CW4011 signals mdu must decode the instruction to see if it is intended for the mdu. mdbusyp mdu busy input the mdu asserts this signal to inform the isu that it is busy and cannot accept any new instructions. mdresp[31:0] multiply/divide instruction result input the mdu drives these signals with the output for an instruction in the ex stage. depending on the type of instruction, and whether the instruction involves the mdu or not, the value of mdresp[31:0] is placed back in a general-purpose core register later in the pipeline. instructions that return values into the core general- purpose register include (but are not limited to): mfhi, mflo, min, max, and some of the extensions for the CW4011. pcancelp pipeline cancel signal from cp0 output the core asserts pcancelp when an instruction that generates an exception enters the cr pipeline stage. pcanoddn pipeline cancel is for odd slot output this signal informs the mdu whether the cancellation is for an odd or even slot. when the core asserts the signal, cancellation applies to the odd slot. when it is deasserts the signal, cancellation applies to both even and odd slots. pcanoddn is valid at the cr stage of the pipeline. pstallp pipeline stall signal from isu output the core asserts this signal to indicate that the pipeline is stalled. the mdu should halt operation until this signal is deasserted. pstalln is also asserted for any pipeline stalls. realmaddp multiplier supports accumulate operation input realmaddp, tied high, informs the core that the mdu supports madd and msubb instructions. this signal is ignored if realmultp is deasserted. realmultp high performance multiplier is installed input realmultp, tied high, informs the core that a mdu is installed. if the mdu supports madd and msubb instructions, then realmaddp must be asserted. it is tied low if no mdu is installed.
multiply/divide unit (mdu) interface 7-43 regs0[31:0] s operand, even instruction output the core drives these signals with the 32-bit value of the rs register. this rs value belongs to an instruction in the ex stage of the even pipeline. the mdu must decode the instruction from inste[31:0] to check if the instruction involves the mdu and if it uses the rs register. regs0[31:0] is valid only for mdu-speci?c instructions. regs1[31:0] s operand, odd instruction output the core drives these signals with the 32-bit value of the rs register. this rs value belongs to an instruction in the ex stage of the odd pipeline. the mdu must decode the instruction from insto[31:0] to check if the instruction involves the mdu and if it uses the rs register. regs1[31:0] is valid only for mdu-speci?c instructions. regt0[31:0] t operand, even instruction output the core drives these signals with the 32-bit value of the rt register. this rt value belongs to an instruction in the ex stage of the even pipeline. the mdu must decode the instruction from inste[31:0] to check if the instruction involves the mdu and if it uses the rt register. regt0[31:0] is valid only for mdu-speci?c instructions. regt1[31:0] t operand, odd instruction output the core drives these signals with the 32-bit value of the rt register. this rt value belongs to an instruction in the ex stage of the odd pipeline. the mdu must decode the instruction from insto[31:0] to check if the instruction involves the mdu and if it uses the rt register. regt1[31:0] is valid only for mdu-speci?c instructions. resetn reset signal output the core asserts this signal to indicate that either a warm reset (wresetn) or a cold reset (cresetn) has occurred. the core resets by jumping to the reset excep- tion code. building block modules, such as the mdu, must also reset.
7-44 CW4011 signals suspexn suspend ex stage output the isu asserts suspexn to request mdu to suspend the instruction in the ex stage. the instruction in the ex stage must be held until the isu deasserts suspexn. instructions in the cr and wb stages must be completed. testmp test mode enable input testmp is used for scan chain testing. it is a static input and must be tied low during normal operation and tied high for scan chain testing. u1eenstp even instruction targeting u1 unit output the core asserts this signal to inform the mdu that an even pipeline instruction is targeting the u1 unit. this instruction is in the rd stage and will move into the ex stage in the next clock cycle. instructions targeting the u1 unit may be multiply, divide, or shift operations. the building block module must decode the instruction from the even pipeline to check if it is an instruction tar- geting the mdu. u1eenstp serves as a warning signal for the mdu. u1oenstp odd instruction targeting u1 unit output the core asserts this signal to inform the mdu that an odd pipeline instruction is targeting the u1 unit. this instruction is in the rd stage and will move into the ex stage in the next clock cycle. instructions targeting the u1 unit may be multiply, divide, or shift operations. the building block module must decode the instruction from the even pipe to check if it is an instruction targeting the mdu. u1oenstp serves as a warning signal for the mdu. 7.13 miscellaneous signals this section describes the miscellaneous CW4011 core signals. bendn big endian input bendn is a static input and must be tied low for big-endian addressing and high for little-endian addressing. bendn affects the byte positions for sizing and load/store data alignment.
miscellaneous signals 7-45 for big-endian mode, the upper 32 bits of scdip must be swapped with the lower 32 bits, and the upper 32 bits of scdop must be swapped with the lower 32 bits outside the core. implop[3:0] prid imp_lo bus input these signals write to the lower four bits of the processor revision identi?er (prid) register imp number. designers can choose to place whatever value they wish on this bus to identify their product revision. instructions reading from the prid register return this value for the lower four bits. the upper four bits of the imp number are set to 0b1000 by lsi logic. revlop[3:0] prid rev_lo bus input these signals write the lower four bits of the prid regis- ter rev number. designers can choose to place what- ever value they wish on this bus to identify their production revision. instructions reading from the prid register will return this value for the lower four bits. the upper four bits of the rev number are set to 0b0000 by lsi logic. scanreqp scan debug event output this signal is not used in this design. you may leave this signal open (unconnected). sclkp system clock input sclkp is the processor system clock input. it provides basic timing for the core and determines instruction cycle times. internal core logic operates synchronously with the rising edge of sclkp. since the core processor operates at 90 mhz, you must supply an 90 mhz clock. sclkp is used for all core modules. testmp test mode enable input testmp is used for scan chain testing. it is a static input and must be tied low during normal operation and tied high for scan chain testing.
7-46 CW4011 signals
8-1 chapter 8 interface operation this chapter examines various CW4011 functional timing scenarios. it does not deal with all timing cases, however, but concentrates on the major CW4011 core timing operations. for details of the operation of any signals discussed in this chapter, see chapter 7, CW4011 signals. this chapter has the following sections: section 8.1, reset and exception signals section 8.2, scbus interface behavior section 8.3, ocabus interface behavior section 8.4, cache interface behavior in the timing diagrams shown in this chapter, all inputs and all outputs must be synchronized to the rising edge of the system clock. all inputs require setup and hold time and all outputs have valid delay times from the clock edge to the appearance of a valid level. 8.1 reset and exception signals the CW4011 has the following reset and exception inputs that connect to coprocessor 0: cold reset warm reset nonmaskable interrupt bus error floating point unit exception interrupts
8-2 interface operation the above inputs must be synchronized to the rising edge of the system clock. the remainder of this section discusses each of these areas in detail, except the ?oating point unit exception. for more information on ?oating point exceptions, please contact your lsi logic representative. 8.1.1 cold reset (cresetn) the primary purpose of a cold reset is to initialize the CW4011 core at power-up. when asserted, cresetn initializes the internal states and control registers in the core. cresetn does not initialize general-purpose registers, i-cache, d-cache, or the mmu tlb. cresetn can be asserted asynchronously, but it must be active for at least two system clock cycles and be deasserted on the rising edge of the system clock. the CW4011 considers cresetn a nonmaskable exception and the core is in idle mode during the period that cresetn is asserted. figure 8.1 shows the timing for a cold reset and the start of an instruction fetch after cresetn is deasserted. figure 8.1 cold reset and pipeline 8.1.1.1 handling cold resets the cpu provides a special interrupt vector (0xbfc00000) for the cresetn exception. the reset vector resides in unmapped and uncached cpu virtual address space, so the hardware does not need to initialize the tlb or the cache to handle the exception. the processor sclkp cresetn instruction 1 instruction 0 reset instruction 0 reset instruction 1 if if rd rd cancelled cancelled if if stall due to i-cache miss stall due to i-cache miss t1 t2 t3 t4 t5 t6 t7 clock cycles
reset and exception signals 8-3 can fetch and execute instructions while the caches and virtual memory are in an unde?ned state. for further information on this subject refer to section 4.4.5.1, cold reset exception. the contents of all registers in the cpu are unde?ned when the cresetn exception occurs except for the following: in the status register, the cu[3:0] and sr bits are cleared to zero, and the erl and bev bits are set to one. the other bits in the register are unde?ned. the random register is initialized to the value of its upper bound. the wired register is initialized to zero. 8.1.1.2 servicing cold resets to service the cresetn exception, you should initialize all processor registers, coprocessor registers, caches, and the memory system. you can do this by performing diagnostic tests and by bootstrapping the operating system. 8.1.2 warm reset (wresetn) the primary purpose of the wresetn exception is to reinitialize the processor after a fatal error. when asserted, wresetn initializes the CW4011 internal states and control registers. wresetn does not initialize general purpose registers, i-cache, d-cache, or the mmu tlb. wresetn must be asserted and deasserted on the rising edge of the system clock. it must remain active for at least two system clock cycles. wresetn is a nonmaskable exception and the CW4011 is in idle mode during the period wresetn is asserted. the start of the instruction fetch after wresetn is deasserted is the same as that of cresetn, as shown in figure 8.1 . 8.1.2.1 handling warm resets the reset exception vector (0xbfc00000) is used for the wresetn exception. the reset vector resides in unmapped and uncached cpu virtual address space, so the hardware does not need to initialize the tlb or the cache to handle the exception. the sr bit of the status
8-4 interface operation register is set to distinguish the wresetn exception from the cresetn exception. unlike a nonmaskable interrupt, wresetn resets bus state machines. like cresetn, it can be used on the processor in any state. the contents of all registers are preserved when wresetn occurs, except for the following: errorpc register, which contains the restart pc erl and bev bits of the status register, which are set to one sr bit of the status register, which is set to one because wresetn can abort cache and bus operations, cache and memory contents are unde?ned after the wresetn exception occurs. for further information on this subject refer to section 4.4.5.2, warm reset exception. 8.1.2.2 servicing warm resets to service the wresetn, you should save the current processor state to use for diagnostic purposes, and also to reinitialize all processor registers, the coprocessor, and the memory system. 8.1.3 nonmaskable interrupt (nmin) the nonmaskable interrupt input nmin must be asserted and deasserted on the rising edge of the system clock. when nmin is sampled and found to be active on the rising edge of the clock, the cp0 provides an nonmaskable exception vector (0xbfc00000). figure 8.2 shows the timing diagram for the fastest detected case. figure 8.3 shows the case in which nmin is not serviced immediately because of a pipeline stall. the CW4011 detects the falling edge of nmin and latches the signal until it is ready to service it.
reset and exception signals 8-5 figure 8.2 nmin and pipeline (detected immediately) sclkp nmin pcancrn pcanoddn instruction 0, 1 if rd ex cr if rd ex if rd if instruction 2, 3 instruction 4, 5 instruction 6, 7 exception if cancelled cancelled cancelled cancelled latched pstalln t1 t2 t3 t4 t5 clock cycles t6 internal nmi instruction 0, 1
8-6 interface operation figure 8.3 nmin and pipeline (nmin is not detected immediately due to stall) 8.1.3.1 handling a nonmaskable interrupt the reset exception vector (0xbfc00000) is also used for the nmin exception. the reset vector resides in unmapped and uncached cpu address space so that the hardware does not need to initialize the tlb or the cache to handle nmin. the sr bit of the status register is set to differentiate this exception from a cresetn exception. because an nmin could occur in the middle of another exception, program execution cannot continue after nmin has been serviced. unlike a cold or warm reset, but like other exceptions, a nonmaskable interrupt is taken only at instruction boundaries. the nmin exception preserves the state of the caches and memory system. for further information on this subject refer to section 4.4.5.2, warm reset exception. sclkp nmin latched pcancrn pcanoddn pstalln instruction 0, 1 instruction 2, 3 ex ex ex ex ex ex cr cancelled rd rd rd rd rd rd ex cancelled if rd cancelled if cancelled if instruction 4, 5 instruction 6, 7 exception t1 t2 t3 t4 t5 t6 t7 t8 t9 clock cycles internal nmin instruction 0, 1
reset and exception signals 8-7 the contents of all registers in the cpu are preserved when this exception occurs, except for the following: the errorpc register, which contains the restart pc the erl and bev bits of the status register, which are set to one the sr bit of the status register, which is set to one 8.1.3.2 servicing a nonmaskable interrupt to service the nmin exception save the current processor state for diagnostic purposes, and for reinitializing the system, including all processor registers, coprocessor registers, caches, and the memory system. 8.1.4 bus error (scberrn) a bus error exception occurs when board-level circuitry detects events such as bus time-outs, bus parity errors, and invalid physical memory accesses. the scberrn exception is not maskable. in the CW4011, bus errors are asynchronous events with respect to cpu instruction processing (much like the nmin interrupt), which means that there is no attempt to identify the instruction that was the root source of the error. the scberrn input from the scbus interface terminates a transaction and generates an exception to inform the CW4011 that an scbus transaction has not been successfully completed. when the CW4011 is driving the scbus, it detects the assertion of scberrn. scberrn assertion should be a synchronous one clock cycle strobe, which is latched in the CW4011 until it is serviced. figure 8.4 shows the timing diagram in which scberrn is serviced immediately and figure 8.5 shows how the exception is serviced later due to stall cycles.
8-8 interface operation figure 8.4 bus error and pipeline (detected immediately) sclkp sctssn scaoen scberrn pcancrn pstalln instruction 0, 1 ex cr cancelled if rd if rd ex cancelled if rd cancelled if cancelled instruction 2, 3 instruction 4, 5 instruction 6, 7 exception t1 t2 t3 t4 t5 if latched internal bus error t6 clock cycles instruction 0, 1
reset and exception signals 8-9 figure 8.5 bus error and pipeline (with stall cycles) 8.1.4.1 handling bus errors the common exception vector, shown in table 8.1 , is used for the scberrn exception. the exccode ?eld in the cause register is set to bus. sclkp sctssn scaoen scberrn pcancrn pstalln instruction 0, 1 ex cr cr cr cr wb rd ex ex ex ex cr cancelled ex rd cancelled cancelled instruction 2, 3 instruction 4, 5 instruction 6, 7 rd latched internal bus error clock cycles t1 t2 t3 t4 t5 t6 t7 t8 exception if instruction 0, 1 table 8.1 common exception vector status register ccc register dev r3000 mode r4000 mode 0 0x80000080 0x80000180 1 0xbfc00180 0xbfc00380
8-10 interface operation the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. 8.1.4.2 servicing bus errors the physical address at which the fault occurred is not available to the exception handler. the process executing at the time of the exception must be handed a bus error signal, which is usually fatal. 8.1.5 external interrupts (extintn) the CW4011 has six external interrupt inputs, exintn[5:0], which must be asserted and deasserted on the rising edge of the system clock. to mask all six external interrupts at once, you can clear the ie bit of the status register. to mask each interrupt individually, program the int bits in the status register. see section 4.3.6, status register, for further information about the status register. the instruction fetch for the exception procedure starts two clocks after an external interrupt has been detected, provided that the pipeline is not in a stall state and there is no higher priority exception. figure 8.6 shows the timing diagram where an interrupt is immediately detected.
reset and exception signals 8-11 figure 8.6 interrupt and pipeline (detected immediately) an extintn exception is similar to an nmin exception, except that external interrupts are not latched internally, and must be asserted until they are serviced. if the pipeline is in a stall cycle, the CW4011 does not service interrupts until the stall condition is resolved. 8.1.5.1 handling external interrupts the common exception vector is used for the extintn exception. the exccode ?eld in the cause register is set to int (value 0). the ip ?eld of the cause register indicates the current interrupt requests. more than one of the bits may be set at the same time. none of the bits may be set if an interrupt is asserted and then deasserted before the CW4011 reads the cause register. the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. sclkp exintn pcancrn pcanoddn pstalln t1 t2 t3 t4 t5 instruction 0, 1 if rd ex cr if rd ex if rd if instruction 2, 3 instruction 4, 5 instruction 6, 7 exception if cancelled cancelled cancelled cancelled t6 clock cycles instruction 0, 1
8-12 interface operation see section 4.4.6.2, interrupt exception, for further information on this subject. 8.1.5.2 servicing external interrupts if one of two software generated exceptions causes the interrupt, clear the corresponding cause register bit to zero to clear the interrupt condition. if the interrupt is hardware generated, correct the condition that caused the assertion of the interrupt pin to clear the interrupt condition. 8.1.6 external vectored interrupt (exvintn) the CW4011 has an external vectored interrupt input, exvintn. the exvap[31:2] inputs provide the interrupt vector virtual address, so the common exception vector base and offset are not used. exvintn must be asserted and deasserted on the rising edge of the system clock. when the exvintn has been sampled and found active on the rising edge of the clock, cp0 samples an exception vector from exvap[31:2], which is available when the enable bit evi in the ccc is set. to mask the exvintn interrupt at once, you can clear the ie bit of the status register. figure 8.7 shows the fastest accepted case of exvintn. if the pipeline is stalled, it requires more clock cycles. when exvapen is asserted, the system may drive exvap[31:2].
reset and exception signals 8-13 figure 8.7 fastest accepted case of external vectored interrupt 8.1.6.1 handling external vectored interrupts the external vectored interrupt feature is available when the evi bit in the ccc register is set. exvintn has lower priority than the six external interrupts exintn[5:0], but higher priority than the debug exception. to mask exvintn, you can use the interrupt enable bit in the status register in a similar way to that used for external interrupts. if exvintn is accepted, the cp0 reads the exception vector address on exvap[31:2] and writes it into the program counter directly. a user- de?ned interrupt controller provides exvap[31:2], so that the CW4011 jumps to the interrupt handler directly when it is requested. exvap[31:2] must be stable until exvapen is asserted. exvintn does not alter anything in the cause register except the bd bit. the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. sclkp exvintn exvap[31:2] exvapen pcancrn instruction 0, 1 instruction 2, 3 cancelled cancelled cancelled instruction 4, 5 exception t1 t2 t3 t4 t5 t6 t7 rd ex cr rd ex rd if rd clock cycles exception address instruction
8-14 interface operation refer to section 4.4.6.3, external vectored interrupt exception, for further information on this subject. 8.1.7 waiti instruction and cpzstallp the CW4011 uses the waiti instruction, which is one of its extended instructions, to initiate a wait state. this stalls the pipeline and reduces power consumption during the period that the CW4011 is inactive. the CW4011 wakes up when it detects an external exception input (enabled interrupt, nmin, warm reset, or cold reset). figure 8.8 shows the timing diagram for the waiti instruction. figure 8.8 waiti and pipeline stall (cpzstallp) the coprocessor interface signal, pstalln, is also asserted when the pipeline stage is in the stall condition. at t1, the cp0 starts executing a waiti instruction. at t3, which occurs in the wb stage of the pipeline, the cp0 requests pipeline stall and the CW4011 asserts cpzstallp. at t6, an external interrupt input is asserted and the CW4011 wakes up from t7. at t8, the instructions in t1 t2 t3 t4 t5 t6 t7 t8 t9 clock cycles sclkp waiti instruction instruction 1 instruction 2 instruction 3 instruction 4 ex cr rd ex wb cr rd ex cr cr cr cr cr cr cr cr cr if rd ex ex if rd ex ex ex ex ex ex ex ex cancelled cancelled exintn pstalln cpzstallp pcancrn cancelled cancelled
scbus interface behavior 8-15 the pipeline stages are cancelled, and the if stage for the exception is started from t9. 8.2 scbus interface behavior the CW4011 generates one or more external data read/write transactions on the scbus under any of the following conditions: uncached area instruction fetch i-cache miss uncached area data read/write load d-cache miss any store execution in writethrough mode d-cache writeback the scbus is a ?exible address/data bus. it is demultiplexed and synchronized to the system clock. it has a data width of 64 bits, but supports one type of bus sizing from a 64-bit width to a 32-bit width. the scbus has the following transaction data sizes: byte, halfword, tribyte, 32-bit word, 64-bit doubleword, or 8-word burst (4-doubleword burst), as shown in table 8.2 .
8-16 interface operation the CW4011 has a four-line-depth write buffer for uncached, d-cache miss, or writethrough store operations. each line in the buffer contains 32 bits of address and 64 bits of data. if word data is stored to a continuous same-doubleword alignment address, two words are stored in one line. the CW4011 then requests a doubleword write transaction on the scbus, which the sizing function can separate into two 32-bit write transactions. 8.2.1 scbus basic transaction figure 8.9 shows a basic scbus transaction for a single read and write transaction. it is a three-clock-cycle transaction, which means that the scbrdyn assertion is sampled on the rising edge of the third clock edge from the beginning of the transaction. the number of clock cycles for the fastest transaction is one clock, in which case sctssn is asserted continuously if the next transaction starts just after the current one. there is no limit to the maximum number of clock cycles for a transaction. a bus watchdog timer must be designed outside the core to assert the bus error signal scberrn, if necessary, when the transaction length is longer than the speci?cation. table 8.2 scbus transaction types cause of scbus transaction transaction type no. of bytes uncached instruction fetch doubleword 8 instruction cache-miss 8-word burst 32 data read by uncached load instruction byte, halfword, tribyte, word 1, 2, 3, 4 data read by d-cache-miss load instruction 8-word burst 32 data write by uncached store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by d-cache-miss store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by writethrough store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by writeback 8-word burst 32
scbus interface behavior 8-17 figure 8.9 scbus basic transaction at the beginning of a transaction, the transaction start strobe signal (sctssn) is asserted for one clock cycle. in addition, the address is output on the scaop[31:0] lines and the address output enable signal (scaoen) is asserted to indicate that scaop[31:0] is valid. the byte enable signals (sctben[7:0]) are also output. if the transaction is a four-doubleword burst, sctbstn is asserted during the ?rst transaction. if the transaction is an in-page write, which means that the next transaction is in the same page, sctpwn is asserted. it is not asserted for burst write transactions. the sctbstn and sctpwn status indication signals are valid by the end of the transaction. sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn 4 sctbstn, sctpwn 5 sctsen scbrdyn arbitration cycle t1 t2 t3 clock cycles address data 1. read cycle. 2. write cycle. 3. high = read, low = write. 4. asserted at ?rst cycle. 5. low for in-page write. data sctben[7:0] sctrqn
8-18 interface operation if the transaction is a data write, data is output to the scdop[63:0] lines and scdoen is asserted from the beginning to the end of the transaction. if the transaction is a data read or instruction fetch, the scdip[63:0] signal lines are sampled on the clock edge as the ready input scbrdyn is asserted. scdoen then indicates the read/write direction of the transaction and controls the three-state buffers external to the CW4011. asserting scbrdyn terminates the transaction. at the same time, the size input bus signal (scb32n) is sampled. according to the input, the biu of the CW4011 determines the valid byte positions for the read transaction bus sizing. if scbrdyn is asserted for a doubleword transaction, the bus interface generates a subsequent transaction for bus sizing. the bus in-page write accept input (scbpwan) is also sampled in an in-page write transaction. if scbpwan is deasserted, the bus interface arbitrates bus requests even if the next transaction is a write transaction in the same memory page. if scbpwan is asserted, the bus interface does not arbitrate bus requests and the next transaction must be a write transaction in the same memory page. if scbpwan is asserted during the in-page write transaction but sctsen is deasserted, the next transaction is a write transaction in the same page. to perform an instruction fetch transaction, the CW4011 asserts scifetn during the same period as scaoen, in order to monitor the transaction. 8.2.2 scbus burst transaction when an i-cache miss occurs, the isu requests an 8-word (4-doubleword) block burst read transaction. when a d-cache miss occurs, the lsu requests an 8-word block burst read transaction. the lsu also requests an 8-word block burst write transaction for d-cache writeback. figure 8.10 shows an eight-word burst read/write transaction that consists of four continuous transactions.
scbus interface behavior 8-19 figure 8.10 scbus eight-word burst transaction timing chart in the ?rst transaction, the burst transaction indicator signal, sctbstn, is asserted to indicate an 8-word burst transaction. subsequent transactions are single doubleword transactions. each transaction is terminated by an assertion of the bus ready signal, scbrdyn. the transaction start signal, sctsen, is asserted for each transaction. burst transactions can be suspended if sctsen is deasserted. the bus hold request signal, schrqn, is not accepted during a burst transaction if sctsen is not deasserted when scbrdyn is asserted. schrqn is accepted if sctsen is asserted to insert one or more idle cycles when scbrdyn is asserted. sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctbln sctben[7:0] scbrdyn sctsen t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address address + 8 address + 16 address + 24 data data data data data data data data 1. read cycle. 2. write cycle. 3. high = read, low = write.
8-20 interface operation sctbln, which indicates whether the last transaction is a burst or a single transaction, is deasserted (high) at the ?rst, second, and third transactions of a four doubleword burst transaction. for a burst read transaction, the ?rst address is the missed address. the addresses of the subsequent transactions are rotative and wrap around ordering in the block. for a burst write transaction, the ?rst address is the beginning of the block and subsequent addresses are incremental. bus sizing for a burst transaction is available to allow the scbus to accomplish burst transactions to 32-bit width devices. the scb32n input must be asserted for each group of burst transactions. if 32-bit sizing is requested for a burst transaction, eight word transactions are generated. sctbln is deasserted from the ?rst to the sixth transaction. the in-page write transaction never occurs if the transaction is a burst write. figure 8.11 shows a timing diagram for an eight-word burst transaction. if the bus slave of the transaction is a synchronous dram system, there are some wait cycles for the ?rst data transfer, but not for subsequent transfers. for a synchronous dram system, sctssn is asserted continuously for the second, third, and fourth data transfers. the dram controller generates addresses for these data transfers itself although scaop also outputs addresses.
scbus interface behavior 8-21 figure 8.11 scbus eight-word burst transaction figure 8.12 shows the ?rst and second transactions of an eight-word burst read/write. transactions are suspended when sctsen is deasserted. sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctbln sctben[7:0] scbrdyn sctsen t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address1 address2 address3 address4 data1 data2 data3 data4 data1 data2 data3 data4 1. read cycle. 2. write cycle. 3. high = read, low = write.
8-22 interface operation figure 8.12 scbus eight-word burst transaction if an individual transaction of a burst transaction is terminated with the deassertion of sctsen, this means the next transaction cannot proceed continuously. in that case, a hold request can be inserted. a hold request can also be inserted if a retry occurs while sctsen is deasserted during a burst transaction. 8.2.3 scbus in-page write transaction an in-page write transaction is one in which continuous write accesses are made to the same row and page in a given address area. most types of dram support this type of fast access, which is used to perform burst read/write transactions. the scbus supports continuous write transactions that have the same upper address. the external write buffer in the lsu compares upper sclkp scaop [31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctben[7:0] sctsen scbrdyn t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address address + 8 data data data data 1. read cycle. 2. write cycle. 3. high = read, low = write.
scbus interface behavior 8-23 address bits of the current write request with those of the next write transaction in the buffer. it provides the bus interface with the result of the comparison. the address range is de?ned in the con?guration register of the CW4011. if the two addresses have the same upper range, the in-page write output (sctpwn) is asserted to inform the external bus slave. the in-page write accept input (scbpwan) must be asserted if the slave is able to accept in-page write transactions. if scbpwan is asserted, the interface does not arbitrate bus requests and the next transaction must be a write transaction. if scbpwan is deasserted, the bus interface performs the next transaction according to the arbitration result. scbpwan is sampled when the bus interface samples an assertion of the scbrdyn signal. the bus interface performs a write transaction if scbpwan is deasserted and there are no higher requests. the scbpwan input has no meaning if the transaction is not an in-page write, and it is ignored when sctpwn is deasserted. the bus interface does not count the number of continuous in-page write transactions. it continues in-page writes until the write buffer is empty, a write transaction is not in the same page address area, or scbpwan is deasserted. when the biu deasserts the transaction start enable signal (sctsen), the CW4011 inserts one or more bus idle states between two in-page write transactions. however, the bus interface does not arbitrate requests during this idle state if the slave accepts the in-page write transactions. a hold request is allowed if the biu deasserts sctsen. the bus interface does not accept the bus hold request during in-page write transactions if the biu receives an asserted sctsen continuously. figure 8.13 shows an example of in-page write transactions.
8-24 interface operation figure 8.13 scbus in-page write transaction (four words) 8.2.4 scbus bus hold there are two ways to hold scbus transactions: external logic asserts the CW4011 bus hold input, schrqn. the CW4011 acknowledges the request by issuing the bus hold grant signal, schgtn. external logic deasserts the transaction start enable signal, sctsen. because there is no dedicated acknowledge signal associated with sctsen, the CW4011 deasserts the address output enable signal, scaoen, after the biu deasserts sctsen to show that the bus interface does not own the bus. the bus hold request signal, schrqn, cannot break in-page write transactions and read/write burst transactions if sctsen is asserted sclkp scaop[31:0] scaoen scdop[63:0] 1 scdoen sctssn sctben[7:0] sctpwn sctsen scbrdyn scbpwan t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address1 address2 address3 address4 data1 data2 data3 data4 1. write cycle.
scbus interface behavior 8-25 continuously. the biu can break the transactions when it deasserts sctsen. to avoid a bus deadlock, a bus retry is requested with each hold request. the current scbus transaction generated by the biu is then terminated by the retry and the hold request must be accepted. figure 8.14 shows the timing diagram for a bus hold request and the associated grant signal, schgtn. the CW4011 asserts the grant signal until the biu deasserts the request. during the period the bus is held, the CW4011 does not detect bus errors. figure 8.14 scbus hold request and grant 8.2.5 scbus bus retry the bus retry signal, scbrtyn, is an input to the biu. it is asserted to abort a transaction and to allow the transaction to be restarted later. the transaction state control goes to the idle state then restarts a transaction when sctsen is asserted. bus retry is valid in a burst transaction. if scbrdyn and scbrtyn are asserted at the same time, scbrtyn has higher priority. if scbrtyn is asserted to hold the bus, schrqn should be asserted before or at the same time as scbrtyn. sclkp scaoen sctssn scbrdyn schrqn schgtn t1 t2 t3 t4 t5 t6 t7 t8 clock cycles t9 1. if biu has next transaction request, schrqn must be asserted before this cycle. 2. if biu has no next transaction request, schgtn is asserted immediately. 3. minimum 1 sclk. 12 3
8-26 interface operation 8.2.6 scbus bus error the external bus controller asserts the biu bus error signal, scberrn, when the current transaction must be terminated as a bus error. if scbrdyn is asserted at the same time, scberrn has higher priority. assertion of scberrn forces the CW4011 to exit the sequential transactions of in-page write and read/write burst transactions. the states of service and transaction control go to the idle state. if the transaction is a burst (cache re?ll or writeback), the CW4011 invalidates the cache line. when a bus error occurs, the cp0 issues a bus error exception. see the section 8.1.4, bus error (scberrn), for more details. a bus error exception is a fatal error for the CW4011. 8.2.7 scbus bus sizing the scbus supports bus sizing for slaves that need sequential address access to 32-bit data. when sizing is requested, the scb32n input to the CW4011 is asserted to separate a doubleword transaction, including part of a burst transaction, into two singleword transactions. the bus interface also selects valid byte positions for a word or a partial word transaction if scb32n is asserted. in the case of a word or a partial write transaction, the bus interface outputs word data to both the upper and lower 32 bits of the data output bus according to address bit 2. the bus interface then completely supports a 32-bit bus interface. although scb32n is sampled with the assertion of the ready signal input, the bus interface behaves as a normal 32-bit data width bus if scb32n is always asserted. if 16-bit or 8-bit width bus sizing is needed, it must be supported outside the CW4011 core. 8.2.7.1 read bus sizing when sizing occurs at a byte, halfword, tribyte, or word during a read transaction, the CW4011 biu can move sampled 32-bit word data to the valid position according to the setting of address bit 2. if sizing is requested for a doubleword transaction, the biu samples 32-bit data at the ?rst transaction then generates a subsequent transaction and packs the ?rst 32 bits and the subsequent 32 bits. the packed data is sent to the isu or lsu. figure 8.15 shows the relationship between the valid
scbus interface behavior 8-27 byte positions of the ?rst and subsequent transactions. in the case of a non-doubleword read, the behavior of a byte, a halfword, and a tribyte transaction is the same as that of a word transaction because sizing supports 32-bit mode only. you can assume that the bus interface samples a doubleword (8 bytes). figure 8.15 shows an example in which the bus interface samples a doubleword (8 bytes). figure 8.15 sampled bytes of first and second transaction scbus data if you are reading a doubleword with 32-bit bus sizing, you will need a second transaction. figure 8.16 shows the doubleword data that is sent to the isu or lsu. figure 8.16 read bytes to isu and lsu with sizing 1. in example 1, one transaction is initiated. the eight bytes sampled are transferred to the isu and lsu without any change. 2. in example 2, one transaction is initiated. four bytes are sampled (bits [31:0]). they are transferred to bits [63:32] and [31:0] of the isu and lsu. 3. in example 3, two transactions are initiated. bits [31:0] of the ?rst transaction are output on bits [31:0], and bits [31:0] of the second transaction are output on bits [63:32]. this doubleword is transferred to the isu or the lsu. a1 b1 c1 d1 e1 f1 g1 h1 a2 b2 c2 d2 e2 f2 g2 h2 63 31 0 1st transaction 2nd transaction 1. the 2nd transaction is generated when the transaction is a doubleword or a part of a burst with scb32n = low. h1 g1 f1 e1 h1 g1 f1 e1 address bit 2 = 0, 1 2-1 type = byte, half, tri, word 63 0 scb32n = low h1 g1 f1 e1 h2 g2 f2 e2 address bit 2 = 0 2-2 type = doubleword 63 0 scb32n = low h1 g1 f1 e1 d1 c1 b1 a1 address bit 2 = 0, 1 1-1 type = any 63 0 scb32n = high example 1 example 2 example 3
8-28 interface operation 8.2.7.2 write bus sizing in the case of a non-doubleword write transaction, the bus interface selects the upper or lower 32 bits of data from the lsu and outputs the same 32-bit word data to the scbus according to address bit 2. the data is output to the scbus before the bus interface detects the sizing input. in the case of a doubleword write transaction, the bus interface generates a subsequent sizing transaction if the sizing input is asserted at the ?rst transaction. figure 8.17 shows the relationship between the doubleword data from the lsu and the scbus. bytes shown in the shaded area have no meaning for the scbus write transaction. figure 8.17 write bytes to the scbus with sizing 1. in example 1, one transaction is initiated. a doubleword from the lsu is output on the date bus without any changes. 2. in example 2, one transaction is initiated. bits [63:0] from the lsu are output on bits [63:32] and [31:0] of the data bus. 3. in example 3, one transaction is initiated. a doubleword from the lsu is output on the data bus without any change. 4. in example 4, two transactions are initiated. in the ?rst transaction, a doubleword from the lsu is output on the data bus without any change. in the second transaction, bits [63:32] from the ?rst transaction are output to bits [31:0] of the data bus. h g f e d c b a address bit 2 = 0 1-1 type = byte/half/tri/word little endian 63 0 scb32n = high/low h g f e d c b a address bit 2 = 0 2-1 type = double 63 0 scb32n = high d c b a d c b a address 2 = 1 63 0 h g f e d c b a address bit 2 = 0 2-2 type = double 63 0 scb32n = low 1st d c b a d c b a address bit 2 = 1 2nd example 1 example 2 example 3 example 4
scbus interface behavior 8-29 as shown in figure 8.18 , you can assume that the lsu sends a doubleword, regardless of the transaction type. figure 8.18 write data bytes from lsu 8.2.8 scbus bus lock the CW4011 sclockn output signal indicates that the scbus is asking to lock bus ownership. the CW4011 asserts sclockn when the CW4011 executes a load linked instruction to start a read transaction in an uncached area or writethrough cached area. it deasserts the signal just before it executes a store conditional instruction to start a write transaction. during the read write transactions, the CW4011 asserts sclockn continuously. if an effective address for a load linked instruction is in the writeback cached area, the CW4011 does not assert sclockn, even if it experiences a d-cache miss. the subsequent store conditional instruction does not generate a write transaction because it may hit the d-cache. if a store conditional instruction hits the d-cache in a writeback cached area when sclockn is asserted, an incorrect condition occurs, and sclockn is deasserted without any bus transactions being executed. the effective virtual addresses of load linked and store instructions must be in kseg1 . additionally, a load linked instruction and a store conditional instruction must be used as a pair of instructions to the same address. while the CW4011 asserts sclockn, the bus interface does not exhibit any special behaviorfor example, it accepts hold requests. if a hold request is not accepted while the CW4011 is asserting sclockn, outside user logic must mask the hold request by asserting sclockn. figure 8.19 shows the timing behavior for locked transactions. if there are other transactions between the read transaction of a load linked and write transaction of a store conditional, the CW4011 asserts sclockn continuously. a b c d e f g h 63 31 0
8-30 interface operation figure 8.19 scbus locked transaction 8.2.9 big-endian con?guration the CW4011 can support big-endian address ordering, although the default con?guration is little-endian. to enable big-endian mode, the bendn input is strapped low. table 8.3 lists the names arbitrarily used to describe the off-core address bus, data bus, and byte enable signals of the big-endian con?guration. since these signals are de?ned outside the CW4011 core, the actual names will be determined by the designers choice of off-core logic. sclkp scaoen sctssn scdoen sclockn scbrdyn load linked store conditional t1 t2 t3 t4 t5 t6 t7 t8 clock cycles read transaction write transaction table 8.3 big-endian arbitrary signal names signals big-endian signals address bus bige_aip[31:0], bige_aop[31:0] data bus bige_dip[63:0] 1 , bige_dop[63:0] 1. bige_dip[63] is the most-signi?cant bit of a doubleword. byte enables bige_ben[7:0]
scbus interface behavior 8-31 table 8.4 lists the bige_ben[7:0] bits and their corresponding bige_dip and bige_dop valid bits. these bit assignments are different from those of the scdip[63:0], scdop[63:0] and sctben[7:0] signals. for big-endian mode, the data bus and byte enable signals outside the CW4011 need to be rede?ned, the most important of which is the de?nition of the byte enables. table 8.5 shows the byte enable and data bus connections. the address bus bit assignments are not shown, but are direct connections from bige_aip[31:0] to scaip[31:0], and from bige_aop[31:0] to scaop[31:0]. table 8.4 big-endian valid bytes bige_ben[7:0] bit byte valid 0 bige_dip[7:0] or bige_dop[7:0] 1 bige_dip[15:8] or bige_dop[15:8] 2 bige_dip[23:16] or bige_dop[23:16] 3 bige_dip[31:24] or bige_dop[31:24] 4 bige_dip[39:32] or bige_dop[39:32] 5 bige_dip[47:40] or bige_dop[47:40] 6 bige_dip[55:48] or bige_dop[55:48] 7 bige_dip[63:56] or bige_dop[63:56]
8-32 interface operation the above data bus con?guration must be de?ned outside the CW4011 core. table 8.6 shows different CW4011 data transactions through these buses. table 8.5 data bus and byte enable connections signal big-endian connection data bus bige_dip[63:32] scdip[31:0] bige_dip[31:0] scdip[63:32] bige_dop[63:32] scdop[31:0] bige_dop[31:0] scdop[63:32] byte enables bige_ben[0] sctben[7] bige_ben[1] sctben[6] bige_ben[2] sctben[5] bige_ben[3] sctben[4] bige_ben[4] sctben[3] bige_ben[5] sctben[2] bige_ben[6] sctben[1] bige_ben[7] sctben[0]
scbus interface behavior 8-33 table 8.6 CW4011 accesses through off-core buses data type bige_aop[2:0] value bige_ben[7:0] value valid data bige_dop bige_dip byte 000 01111111 [63:56] [63:56] 001 10111111 [55:48] [55:48] 010 11011111 [47:40] [47:40] 011 11101111 [39:32] [39:32] 100 11110111 [31:24] [31:24] 101 11111011 [23:16] [23:16] 110 11111101 [15:8] [15:8] 111 11111110 [7:0] [7:0] half-word 000 00111111 [63:48] [63:48] 010 11001111 [47:32] [47:32] 100 11110011 [31:16] [31:16] 110 11111100 [15:0] [15:0] tribyte 000 00011111 [63:40] [63:40] 001 10001111 [55:32] [55:32] 100 11110001 [31:8] [31:8] 101 11111000 [23:0] [23:0] word 000 00001111 [63:32] [63:32] 100 11110000 [31:0] [31:0] doubleword 000 00000000 [63:0] [63:0]
8-34 interface operation 8.3 ocabus interface behavior the CW4011 on-chip access (oca) bus enables access to on-chip modules at the cr stage without going through the scbus. section 7.4, ocabus interface, provides additional information about the bus. this section describes certain ocabus transactions in detail and provides appropriate timing diagrams. these oca transaction descriptions include: basic oca access rejection of oca access ocabus access with stall at the ex pipeline stage ocabus access with stall at the cr pipeline stage ocabus access with stall request or wait state ocabus access with pipeline cancellation 8.3.1 basic ocabus transaction regardless of the type of load or store execution, address and size are output at the ex stage of the CW4011 pipeline, and exloadp or accstorep is asserted. the address bits (dvaddrp[31:0]) need to be decoded to determine whether or not ocacceptp is asserted and the oca module can accept the oca transaction. typically, oca module addresses should be located as uncached devices, so that the virtual address is in kseg1 . this is done by setting address bits [31:29] to 0b101. the address bus must be latched on the rising edge of the system clock, between the ex and cr stages, as shown in figure 8.20 . the size information provided by accsize is also latched at this time. refer to the subsection entitled accsizep[1:0] ocabus transaction size output on page 7-13 for more information on this subject. at the cr stage, write data is output on cptocdp provided that cptocen is asserted. if a read transaction is being executed, the cpfrcen signal is asserted and data on the cpfrcdp bus is sampled on the rising edge of the system clock between stages cr and wb. the ocacceptp signal must be asserted in the cr stage to inform the CW4011 that an oca transaction is in progress.
ocabus interface behavior 8-35 the crvalidp signal is asserted to indicate that the cr stage is valid. if it is deasserted, write data must not be written and read data must not be sampled. the transaction is executed again later. figure 8.20 typical ocabus transaction 8.3.2 ocabus transaction rejected figure 8.21 shows the timing for an ocabus transaction that is rejected because ocacceptp is deasserted during the cr stage. this occurs when the virtual address is decoded and found not to be an address for an oca module. under these conditions, the CW4011 reads from the d-cache, requests an scbus read transaction, and then writes data to the d-cache write buffer or to a four-deep external write buffer. ocacceptp is the only signal that determines whether an oca transaction will take place. sclkp dvaddrp[31:0], accsizep[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages va wd 1. write cycle. 2. read cycle. rd
8-36 interface operation figure 8.21 ocabus transaction rejected by address decoder 8.3.3 ocabus access with stall at ex stage figure 8.22 shows an example where pstalln is asserted at the ex stage of the CW4011 pipeline, causing all pipeline stages to enter a stall state. when this happens, dvaddrp[31:0], accsizep[1:0], exloadp, or accstorep are held during the stall cycles. dvaddrp[31:0], accsizep[1:0] va sclkp cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages wd 1. write cycle. 2. read cycle. rd
ocabus interface behavior 8-37 figure 8.22 ocabus with stall at ex stage 8.3.4 ocabus access with stall at cr stage figure 8.23 shows an example where pstalln is asserted at the cr stage of the CW4011 pipeline causing all pipeline stages to enter a stall state. when this happens, data on the cptocdp bus, crvalidp, and ocacceptp are held. sclkp dvaddrp[31:0], accsizep[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex ex ex cr wb stages wd rd va 1. write cycle. 2. read cycle.
8-38 interface operation figure 8.23 ocabus access with stall at cr stage 8.3.5 ocabus access with stall request figure 8.24 shows an example where the oca bus device needs to insert some wait cycles before a read or write operation. to request a pipeline stall, the processor asserts cpsreqn from the beginning of the cr stage and this causes pstalln to be asserted. cpsreqn must be asserted and deasserted early in the clock cycle, since it is one of the critical path signals. sclkp dvaddrp[31:0], accsizep[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr cr cr wb stages rd va wd 1. write cycle. 2. read cycle.
ocabus interface behavior 8-39 figure 8.24 ocabus access with stall request 8.3.6 ocabus access with pipeline cancel figure 8.25 shows an example where a load or store instruction is cancelled by an exception. the exception is indicated when crvalidp is deasserted. when this happens, the write data must not be written into the oca module. the read data being transferred to the CW4011 core is ignored. the cancelled load or store instruction may be executed later. rd ex cr cr cr wb rd wd va rd sclkp cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] stages dvaddrp[31:0], accsizep[1:0] 1. write cycle. 2. read cycle.
8-40 interface operation figure 8.25 ocabus access with pipeline cancel 8.4 cache interface behavior when an external bus master writes data into main memory, it can invalidate the d-cache and i-cache lines to maintain coherency between the main memory and the caches. the CW4011 has three signals to support this function: cache invalidate address bus bits (cinvap[31:5]) d-cache invalidate strobe (dcinvsn) i-cache invalidate strobe (icinvsn) when dcinvsn or icinvsn is asserted, the address on the cinvap bus is latched and the CW4011 starts an invalidation process. dcinvsn or icinvsn should be asserted for only one clock cycle. the d-cache or i-cache line is invalidated when the cache physical address tag, whose sclkp dvaddrp[31:0], accsizep[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages va wd rd 1. write cycle. 2. read cycle.
cache interface behavior 8-41 line is valid, is coincident with the latched invalidate address. both the v bit and the wb bit are cleared. figure 8.26 shows the timing diagram for d-cache invalidation implemented by bus snooping. in the ?rst clock cycle after dcinvsn is asserted, the lsu asserts the stall request signal if the ex stage is a load/store instruction. in the second cycle, the d-cache tag is read from the d-cache and compared with the address of the latched dcinvap bus. if they match and the ex stage is a load/store instruction, the pipeline stall request is asserted. to avoid timing problems, pstalln may not be deasserted during the second cycle. in the third cycle, the v bit and wb bit of the d-cache line are cleared and the line is invalidated. if the addresses do not match at the third cycle, the d-cache is not accessed. the stall cycle signal (pstalln) is asserted at the third clock cycle even if the address and d-cache tag do not match and the valid bit is not cleared.
8-42 interface operation figure 8.26 d-cache invalidation by snooping sclkp cinvap[31:5] dcinvsn pstalln instruction 1 instruction 2 rd ex cr wb rd ex cr wb d-cache tag address a inst 1 a 1st 2nd 3rd 4th d-cache tag access write case 2. two stall cycles instruction 3 rd ex cr rd ex cr wb rd ex cr wb a inst 3 a 1st 2nd 3rd 4th read write rd ex cr wb ex rd case 1. no stall cycle load/store load/store read read read sclkp cinvap[31:5] dcinvsn pstalln instruction 1 instruction 2 d-cache tag address d-cache tag access instruction 3 rd ex cr wb rd ex cr a 1st 2nd 3rd 4th read rd ex cr case 3. one stall cycle (no invalidation) load/store - ex rd load/store wb ex rd inst 2 read a a a a
cache interface behavior 8-43 the lsu does not do anything if the external bus master read data from main memory and the address is dirty-cached by the CW4011. in this case, you may use writethrough mode for the page. figure 8.27 shows timing for i-cache invalidation brought about by bus snooping. it needs a two-cycle stall if the invalidation address hits the tag or a one-cycle stall if it does not hit the tag. figure 8.27 i-cache invalidation by snooping sclkp cinvap[31:5] icinvsn pstalln 1st 2nd 3rd 4th 1st 2nd 3rd 4th case 2. two-stall cycle (hit) case 1. one-stall cycle (miss) aa
8-44 interface operation
9-1 chapter 9 iceport this chapter outlines the serialice scan interface and describes in detail the CW4011 iceport building block. this chapter is divided into the following sections: section 9.1, overview section 9.2, iceport features section 9.3, iceport functional blocks section 9.4, iceport signals section 9.5, iceport registers section 9.6, iceport operations section 9.7, iceport pin buffers and drivers 9.1 overview the iceport is a full-duplex serial uart receive and transmit port building block available from lsi logic. the core designer uses the iceport both to download core application software and as a CW4011 debugging tool. the iceport works with an icecontroller at baud rates up to 1 mbit/s, providing 800 kbits of data per second. figure 9.1 shows a block diagram of a CW4011 system with the iceport installed. for lsi logics lr4500 chip, the CW4011 iceport is integrated with the sclc and sdramc modules on the core scbus. if desired, the iceport can connect directly to the scbus without the sclc module.
9-2 iceport figure 9.1 CW4011 design with iceport 9.2 iceport features the iceport provides the following features: full-duplex operation. requires clock support at 16 times the transfer bit rate to de?ne receiving (rx) and transmitting (tx) rates. this clock is common for rx and tx, and may be either an external clock or one generated internally from the system clock. rx ready signal to indicate that a byte of data has been received and is in the data byte input buffer. separate status and data registers for rx and tx. the rx status register contains one bit that indicates received data is in the iceport, and one bit that indicates an overrun in the rx input buffer. the tx status register contains one bit that indicates the iceport is ready to transmit data. serial-receive and clock input do not require an active signal when the iceport is unused. during reset, the tx uart port defaults to an idle state and transmits an idle signal. CW4011 scbus rx input serial clock tx output rx interrupt scbus control signals iceport sdramc sclc core interrupt
iceport functional blocks 9-3 9.3 iceport functional blocks the CW4011 iceport design has been partitioned into three logical blocks: receive and transmit logic block, which sends and receives the icetxp and icerxp signals. generic interface logic block, common to most core designs that implement a serialice iceport. scbus interface logic block, which connects the iceport with the rest of the CW4011 core through the scbus signals. figure 9.2 illustrates these blocks, their interactions with each other, and their interfaces to other cores and external logic. figure 9.2 CW4011 iceport block diagram scbus icerxp icetxp iceclkp sc_icedip[7:0] sclkp sclkp irxrdyp operation cresetn wresetn scaop[31:0] scdoen sctssn sc_icerdyp sc_icedop[31:0] sc_icedoep scbus glue logic generic interface receive (rx) transmit (tx) iceport select read/write ready generation address control decode sc_iceintp iceport building block
9-4 iceport 9.3.1 receive and transmit interface logic the two right-most blocks are the serial interface block, speci?cally the receive (rx) and transmit (tx) blocks. the rx block receives the icerxp bit stream, and the tx block transmits the icetxp bit stream. both blocks receive the internal cpu clock (sclkp) and the external x16 bit rate clock (iceclkp). both blocks synchronize timing between the iceclkp and sclkp timing domains. all interface signals between the rx and tx blocks and the generic interface are synchronized to sclkp, since the generic logic block runs on sclkp only. 9.3.2 generic interface logic the center generic interface block connects the tx and rx blocks to a speci?c core bus interface, which is the scbus for the CW4011. the iceport directly outputs only the irxrdyp signal, which must be enabled in the rx setup register. when enabled, the irxrdyp signal indicates that rx data has been received. irxrdyp is tied to the processor interrupt signal (sc_iceintp) and may be used for interrupt generation as described in section 9.6.4.1, receive (rx). 9.3.3 scbus interface logic the left-most block in figure 9.2 is the scbus interface logic block. this block connects the generic interface to the CW4011 scbus signals. the scbus is the main internal CW4011 bus that allows a bus master to exchange information with the core. in scbus transactions, the iceport decodes the scbus address line and checks the transaction start signal (sctssn) to see if the current scbus transaction involves the iceport. if the current transaction involves the iceport, the scbus interface logic block either places appropriate data on the data bus or writes data into an iceport internal register, depending on whether the current operation is a read or a write. once either transaction is complete, the iceport asserts the acknowledge signal (sc_icerdyp) and the scbus interface logic block begins to monitor scbus transactions again. please be aware that the iceport follows a different scbus protocol than other CW4011 core components. the iceport uses only a certain subset of the entire scbus signals and combines several scbus acknowledge signals into a single iceport signal. see section 9.4.1, monitored scbus signals, and section 9.4.2, other scbus signals, for more information on iceport/scbus interaction.
iceport signals 9-5 9.4 iceport signals this section describes the signals that comprise the bit-level interface of the iceport. the following paragraphs outline the conventions used in the signal descriptions: the signals are described in alphabetical order by mnemonic within each functional group. each signal de?nition contains the mnemonic and the full signal name. the mnemonics for signals that are active high, or for clock signals with a positive rising edge, end with a p; signals that are active low end with n. the term assert means to drive true or active; deassert means to drive false or inactive. input and output in the signal headings refer to i/os with respect to the iceport, not with the core. for example, sctssn is a core output, but because it is considered an iceport input, it is labeled input. all input signals, except for icerxp and iceclkp, are read on the positive edge of sclkp and must therefore be generated synchronously with sclkp. all output signals (except icetxp) are also generated synchronously at the rising edge of the sclkp clock. the icetxp signal is synchronous to the rising edge of iceclkp, except during a reset where icetxp is asserted asynchronously to iceclkp. in normal serial send and receive through the iceport, iceclkp runs at 16 times the rate of serial bit transmission/receive. this allows iceclkp to de?ne the bit width for each uart serial bit. the iceport assumes that each serial bit for both receive and transmit is 16x iceclkp, or 16 iceclkp cycles. figure 9.2 summarizes the iceport signals. detailed descriptions follow the table. note that the scbus master can either be the sclc module or the CW4011 processor. external logic refers to logic not related to the CW4011 core, the sclc, or the iceport.
9-6 iceport figure 9.3 iceport logic diagram 9.4.1 monitored scbus signals this section lists the scbus signals that the iceport monitors and outlines how the iceport uses these signals. for a more complete description of these signals, please see chapter 7 , CW4011 signals . cresetn cold reset input asserting cresetn asynchronously resets the iceport and all iceport registers. cresetn and wresetn are internally merged in the iceport. wresetn warm reset input asserting wresetn asynchronously resets the iceport and all iceport registers. cresetn and wresetn are internally merged in the iceport. scaop[31:0] scbus address bus input scaop[31:0] is the address bus. the iceport monitors this bus and sctssn for data read/write operations involving the iceport. when an scbus transaction involves the iceport, the iceport decodes scaop[31:0] to decide which internal register the transaction targets. scdoen scbus data output enable input the value of scdoen determines whether the present scbus transaction is a write or a read. if a write, scdoen is driven low; if a read, scdoen is driven high. the monitored scbus signals cresetn wresetn scaop[31:0] scdoen sctssn other scbus signals sc_icedip[7:0] sc_icedop[31:0] sc_icedoep sc_icerdyp sc_iceintp iceport iceport scan and clocking signals sclkp iceclkp icerxp icetxp se si so testmp
iceport signals 9-7 iceport monitors scdoen so that it may perform the cor- rect action for either a read or a write. sctssn scbus transaction start signal input the core asserts sctssn for one clock cycle at the beginning of a transaction to announce that a new trans- action has begun. assertion of sctssn and a valid scaop[31:2] address will initiate an iceport read/write operation. 9.4.2 other scbus signals these signals enable iceport read and write operations and transfer data for these operations. sc_icedip[7:0] scbus input data bus input this is the scbus input data bus. for write operations to the iceport, data transfers to the iceport through this bus. on the same positive edge of sclkp that asserts sc_icerdyp, the core writes data into the iceport. sc_icedop[31:0] scbus output data bus output this is the scbus output data bus. for read operations from the iceport, the iceport will place data onto this bus. data on this bus is valid for one clock cycle and only when the sc_icedoep signal is asserted. sc_icedoep scbus output data valid output asserting this signal indicates that the sc_icedop[31:0] bus is valid during the current cycle. sc_icedoep asserts for read operations only and lasts only one sclkp cycle. sc_icerdyp iceport ready output asserting this signal high informs the core or the sclc module that the current transaction on the scbus has ?n- ished. sc_icerdyp encompasses both the scb32n and scbrdyn scbus control signals. sc_iceintp iceport interrupt output if this signal is enabled by the rxrxrdype bit in the rx setup register, the iceport asserts sc_iceintp once it receives a valid byte of off-chip data. in the lr4500, the sc_iceintp output is sent to the sclc module, which
9-8 iceport then generates an interrupt to the core. sc_iceintp is also referred to as irxrdyp in this document, since sc_iceintp is tied to the iceport generic interfaces irxrdyp signal. 9.4.3 iceport scan and clocking signals these signals are the clocking and scan i/o signals for the iceport. sclkp system clock input sclkp is the global system clock input from the CW4011 core. iceclkp ice serial bit clock rate x16 input the iceport requires that this off-chip signal have a clock frequency 16 times the serial transmit/receive rate. the iceport assumes each serial/transmit bit is 16 iceclkp cycles long. icerxp rx serial bit receive input this is an off-chip input that holds the uart serial input data stream. each received bit is 16 iceclkp cycles long. icetxp tx serial bit transmit output this is an off-chip output that holds the uart serial data stream. each transmit bit is 16 iceclkp cycles long. se scan test mode enable input asserting this signal high enables the scan chain and deasserting se disables scan operation. the testmp signal must also be continuously asserted to enable the entire scan test. si scan test input input si is the scan chain data input signal. so scan test output output so is the scan chain data output signal. testmp scan test setup input this signal sets up the scan test, so that scan mode is possible in the sclkp clock domain. testmp must be asserted continuously to enable the scan test.
iceport registers 9-9 the iceclkp signal is ignored while testmp enables the scan test mode. 9.5 iceport registers all iceport registers are memory mapped as shown in table 9.1. the default iceport virtual base address is set to 0xb0ff0000 (0x10ff0000 physical address). users can customize the iceport address by altering the addresses in the hdl models. however, the last nibble (bits [3:0]) must be the kept the same, since these four bits determines which iceport register to access. the addresses must also be both unmapped to prevent an installed mmu from remapping memory addresses and uncached to maintain data congruency. for these reasons, lsi logic suggests using unmapped and uncached memory space kseg1 . all register read transactions return zeroes for bits [31:8], and data for bits [7:0]. for read operations, the register bits are mapped with scdip[31] to sc_icedop[31], and so on. for write operations, the register bits are mapped with scdop[7] to sc_icedip[7], and so on. during write operations, data on scdop[31:8] is ignored, write transactions to read-only registers are ignored, and read transactions from the write-only registers return unde?ned data. all registers must be accessed only using word accesses to avoid con?ict between big-endian and little-endian data structures, and to avoid partial update problems. table 9.1 iceport registers register physical address virtual address reference page rx status 0x10ff0000 0xb0ff0000 9-10 rx setup 0x10ff0000 0xb0ff0000 9-11 rx data 0x10ff0004 0xb0ff0004 9-11 tx status 0x10ff0008 0xb0ff0008 9-12 tx data 0x10ff000c 0xb0ff000c 9-12
9-10 iceport note that each bit ?eld within a register is described by mnemonic in the register ?gure, and by mnemonic, full name, bit number, and read/write status within the following bit ?eld description. 9.5.1 rx status register the read-only rx status register provides status information for iceport receive operations and indicates the state of the rx data register. figure 9.4 shows the rx status register. figure 9.4 rx status register r reserved [31:2] these bits are reserved for use by lsi logic and are read as zeroes. rxoverrun rx overrun 1 this bit is set to one when an rx overrun error occurs. an rx overrun error occurs when a new rx byte is received, as indicated by rxrdy, before the previous rx byte has been read. for an overrun error, the new byte is not accepted and the pending byte in the rx data register is not lost. when the rxoverrun bit is set, it signals that at least one byte from the serial input stream of the new frame has been lost. rxoverrun is cleared when the rx status reg- ister is read. this ensures that if another overrun occurs between the rx status register read and the rx data register read that this overrun will set rxoverrun. rxoverrun clears to zero during an iceport reset. rxrdy rx byte ready 0 when the rx block receives a byte, this bit is set to one. rxrdy clears to zero when the rx data register is read, and at reset. the irxrdyp (sc_iceintp) output signal, if enabled, re?ects the state of the rxrdy bit. 31 21 0 r rxoverrun rxrdy
iceport registers 9-11 9.5.2 rx setup register the write-only rx setup register enables and disables the sc_iceintp interrupt signal when the rxrdy bit in the rx status register is set. if software clears the rxrxrdype bit to zero, then the iceport interrupt signal sc_iceintp is disabled. this feature was added to allow software to disable the interrupt signal, irxrdyp (sc_iceintp), if the irxrdyp signal were tied to a not maskable interrupt (nmi) input. figure 9.5 shows the rx setup register, with bit field descriptions following the ?gure. figure 9.5 rx setup register r reserved [31:1] these bits are reserved for lsi logic, and any writes to these bits are ignored. rxrxrdype sc_iceintp(irxrdyp) enable 0 when this bit is set to one, the sc_iceintp signal re?ects the state of the rxrdy bit in the rx status register. when software clears rxrxrdype to zero, the sc_iceintp signal is continually deasserted. rxrxrdype clears to zero during an iceport reset. 9.5.3 rx data register the read-only rx data register, shown in figure 9.6 , holds received data in bits [7:0]. rxdata is valid only when the rxrdy bit in the rx status register is set. the rx data register is unde?ned after an iceport reset. figure 9.6 rx data register r reserved [31:8] these bits are reserved for lsi logic and are read as zeroes. 31 10 r rxrxrdype 31 8 7 0 r rxdata
9-12 iceport rxdata received bit stream [7:0] this bit ?eld holds data received from the icerxp serial input signal. data held in rxdata is valid only when the rxrdy bit in the rx status register is set. rxdata is unde?ned after an iceport reset. 9.5.4 tx status register the read-only tx status register, shown in figure 9.7 , provides status information for tx operations. figure 9.7 tx status register r reserved [31:1] these bits are reserved for lsi logic and are read as zeroes. txrdy txrdy rx ready 0 this bit is set to one when either the tx data register is ready for the next transmit byte, or after reset. txrdy remains set during and after the transmission of tx data. txrdy clears to zero during a write to the tx data register. txrdy is set to one after an iceport reset. 9.5.5 tx data register the write-only tx data register, shown in figure 9.8 , holds the serial transmission data. figure 9.8 tx data register r reserved [31:8] these bits are reserved for lsi logic, and any writes to these bits are ignored. 31 10 r txrdy 31 8 7 0 r txdata
iceport operations 9-13 txdata transmitted bit stream [7:0] when the txrdy bit in the tx status register is set, data for transmit through icetxp may be written to the txdata bits. writes to the txdata bits when the txrdy is zero are ignored. 9.6 iceport operations this section describes the different operations of the iceport, and is divided into the following sections: section 9.6.1, scbus read/write transactions section 9.6.2, reset section 9.6.3, serial bit stream section 9.6.4, iceport receive and transmit section 9.6.5, clock domains and properties 9.6.1 scbus read/write transactions all read or write operations to the iceport occur through the scbus. both transactions require two cycles once scbus arbitration is decided. for either transaction, the bus master ?rst must win arbitration for scbus control and decide to initiate a transaction. the bus master then places the target address for the transaction on scaop[31:0] and asserts sctssn for one cycle to indicate the start of a new operation. the iceport constantly decodes scaop[31:0] and monitors sctssn for transactions that target the iceport. if the iceport is the target of an operation, it checks the scdoen signal to determine whether this transaction is a read or a write. for a read, the iceport places data on sc_icedop output bus, asserts sc_icerdyp, and then asserts sc_icedoep at the next clock cycle. for a write, the iceport latches data on sc_icedip into the proper register on the next rising edge of the clock and asserts sc_icerdyp at the following clock cycle. in order to ensure that information is not lost, the scbus master must hold the scaop[31:0], scdoen, and scdop[31:0] signals until the iceport asserts the sc_icerdyp acknowledge signal.
9-14 iceport for data transfer, the scdop[7:0] output bus connects to the iceport sc_icedip[31:0] input bus. the scdip[31:0] input bus connects to the iceport sc_icedop[31:0] output bus. the upper 32 bits of both scbus data buses scdop[63:32] and scdip[63:32] are not used for iceport transactions. figure 9.9 shows an iceport read, and figure 9.10 shows an iceport write. for both ?gures, the signals cresetn, wresetn, se, and testmp are assumed deasserted throughout the transaction. all read/write operations are synchronous to the rising edge of the sclkp. detailed descriptions follow the ?gures. figure 9.9 read operation sclkp sctssn scaop[31:0] scdoen sc_icedop[31:0] sc_icerdyp cycle 2 cycle 3 cycle 1 sc_icedoep cycle 4
iceport operations 9-15 figure 9.10 write operation the following comments outline operations during cycles 1 to 4 presented in both ?gures. cycle 1: the bus master wins arbitration of the scbus. cycle 2: the bus master asserts sctssn for one cycle to indicate the start of a new transaction. it also places the target address on scaop[31:0] and asserts scdoen for a write operation, or deasserts scdoen for a read operation. for a write, the bus master also drives scdop[31:0] with the data to be trans- ferred. cycle 3: the iceport recognizes that it is the transaction target. for a read, the iceport places the appropriate data on the sc_icedop[31:0] bus and asserts sc_icedoep. for a write, the iceport writes sc_icedip[7:0] data into the appropriate register. the iceport then asserts sc_icerdyp to indicate that the transaction has ?nished. cycle 4: the iceport deasserts sc_icerdyp at the rising edge of sclkp. for a read transaction, the iceport also deasserts sc_icedoep and the scbus master must latch the data on the rising edge of sclkp at the start of this cycle. at the end of cycle 4, the iceport is ready to begin a new transaction. sclkp sctssn scaop[31:0] scdoen sc_icedip[7:0] sc_icerdyp cycle 2 cycle 3 cycle 1 cycle 4 sc_icedoep
9-16 iceport 9.6.2 reset an iceport system reset occurs when either cresetn or wresetn is asserted for at least one sclkp cycle. cresetn must be asserted when the system is powered up in order to set the iceport in a prede?ned state. since the reset signals are synchronous to sclkp, the iceport can be reset even if the iceclkp clock is not running. an iceport system reset performs the following functions: rxoverrun and rxrdy bits in the rx status register are cleared, indicating that the rx data register is unde?ned. the rxrxrdype bit in the rx setup register is cleared. this causes the irxrdyp (sc_iceintp) signal to be deasserted. the txrdy bit in the tx status register is set. 9.6.3 serial bit stream the iceport receives data on icerxp and transmits it on icetxp in serial bit streams. in the receive (rx) block, the iceport receives data. when no data is being transferred, the transmit (tx) block holds icetxp idle high. figure 9.11 shows an interpretation of the serial bit stream on the data line. the data bytes are received in frames, with each frame consisting of three pieces: a start bit, always low a byte of data, transmitted true level from lsb (bit 0) to msb (bit 7) and a stop bit, always high all bits in a frame are 16 iceclkp cycles long. the data line remains high after the stop bit when the line goes idle, until the next start bit drives the line low.
iceport operations 9-17 figure 9.11 serial bit stream 9.6.4 iceport receive and transmit there are two iceport serial interface blocks, speci?cally the receive (rx) and transmit (tx) blocks. the rx block receives the icerxp bit stream, and the tx block transmits the icetxp bit stream. both blocks receive the internal cpu clock (sclkp) and the external bit rate clock (iceclkp). both blocks synchronize timing between the iceclkp and sclkp timing domains. figure 9.12 shows a simple block diagram of the rx and tx blocks (shaded) with signals and clocking. figure 9.12 rx and tx blocks 0 1 2 3 4 5 6 7 data line bit positions frame idle state start data byte stop bit bit bit iceclkp 16 cycles icerxp icetxp iceclkp iceport sclkp transmit (tx) receive (rx) generic interface
9-18 iceport the remainder of this section details both the receive and transmit iceport operations. 9.6.4.1 receive (rx) icerxp is the serial data input to the iceport. the rx block receives the icerxp signal and reads it on the rising edge of iceclkp. iceclkp can be used to generate both the transmit and receive data clocks, but usually two different clocks are implemented. this is not a problem, as long as the difference between the two clock frequencies is below a certain limit, as outlined in section 9.6.5, clock domains and properties. the rx block is synchronized when icerxp has been high for nine bit times (144 iceclkp cycles) or more, which indicates that the data line is in an idle state. the rx block must be synchronized after power on, reset, serial cable connection, or any other event that would alter rx block synchronization. after synchronization, the rx block begins sampling icerxp on each rising edge of iceclkp. when it samples a low icerxp value, the rx block recognizes this as the start bit of a new data frame and prepares for the serial data stream. the width of each received bit is assumed to be 16 iceclkp cycles, even though the clock that generated the data for icerxp may be different from the iceclkp. the value of icerxp at the eighth iceclkp rising edge is assumed to be the value of the bit, and the bit is then received. if the start bit is high, the frame is ignored. in this case, the icerxp low value that indicated the start of the frame was accidental. figure 9.13 shows the serial bit clocking relative to iceclkp. figure 9.13 received bit timing iceclkp 16 cycles icerxp start first second 16 cycles data bit data bit bit
iceport operations 9-19 the rx block places the eight data bits received after the start bit in the rx data register. the ?rst data bit received after the start bit is the lsb (bit 0), and the eighth data bit received is the msb (bit 7). the eight data bits received between the start and stop bits are all true level values. a valid high stop bit received at the end of the frame sets the rxrdy bit in the rx status register. the irxrdyp (sc_iceintp) output re?ects the state of the rxrdy bit, if enabled by the rxrxrdype bit in the rx setup register. irxrdyp can be used as an interrupt to ensure that the cpu reads the data received, thus avoiding overruns. if the stop bit is low, the frame is ignored. a received data byte is not placed in the rx data register until a valid stop bit is received. this data byte will be available throughout the next data byte (frame) receive, until the next valid stop bit refreshes the rx data register. in other words, a previously received data byte is present in the rx data register for at least nine bit cycles (144 iceclkp cycles) after a new start bit (for a new frame) is received. if a previously received byte has not been read when a new byte is ready for the rx data register, an overrun error occurs. when an overrun error occurs, the iceport sets the rxoverrun bit in the rx status register, and the new frame is discarded. if the iceport receives an invalid stop bit, the stop bit is not recorded by the iceport registers and the frame is still discarded. the iceport will not accept a new start bit until the previous frame has been ?nished by a valid stop bit or a high value on icerxp. this ensures that the iceport will not indicate a runaway receive if icerxp is tied low in error. therefore, the iceport will not receive a frame after reset if icerxp is continuously either high or low. when the rx block receives the stop bit correctly, a low value in the bit stream immediately following the stop bit will start the next frame. the start bit must be allowed to begin this quickly, since iceclkp may be slower than the clock that generates the data for icerxp. in such a case, the next received frame may start on the next sample iceclkp. 9.6.4.2 transmit (tx) the icetxp signal is the iceport serial data output and can carry new data every 16 iceclkp cycles. when there is no data for transmission,
9-20 iceport icetxp is held high in an idle state. during this idle state, the txrdy bit in the tx status register is set to one, which indicates that transmission may be initiated by placing data in the tx data register. after data is written to the tx data register, the iceport clears the txrdy bit to zero. start bit transmission begins on the rising edge of iceclkp and the ?rst data bit starts transmitting 16 iceclkp clock cycles later. every bit of the transmitted frame has a width of 16 iceclkp cycles. the tx data register lsb (bit 0) is transmitted just after the start bit; the msb (bit 7) is sent just before the stop bit. all data bits are transmitted true level, with zeroes sent as low values and ones sent as high values. the iceport sets the txrdy bit in the tx status register when data bit 7 (the end of the byte) begins transmitting. as soon as txrdy is set, the next data byte to transmit can be written to the tx data register. writing to the tx data register while either data bit 7 or the stop bit is transmitting ensures that the icetxp signal will not be idle. if the next data byte is not written to the tx data register before the stop bit is transmitted, the tx block will idle for a number of iceclkp cycles, until new data is available in the tx data register. 9.6.5 clock domains and properties since data commonly moves between the iceclkp domain and the rx clock domain, these two clocks must have frequencies within certain limits. the difference between the iceclkp frequency and the icerxp clock frequency may be no more than 1%, with icerxp jitter margins 10% of the bit width. this jitter can originate from transmission cables or different timing in low-to-high and high-to-low transitions. the uart receiving the output from icetxp may, however, require less difference between the two frequencies, and this requirement must be observed. the iceclkp signal may be derived from sclkp by using a divider. this method frees a pin since iceclkp no longer requires an external pin. the operation of the iceport does not change in any way if iceclkp is derived from sclkp, but the frequency difference of 1% must be adhered to regardless of the clock rate.
iceport pin buffers and drivers 9-21 the iceport may also transfer data internally between the two clock domains (between iceport and core). for these transactions, the iceclkp frequency can be at most one fourth of the sclkp frequency. no matter what the frequency difference between iceclkp and sclkp, the bus master must have enough time to read received data before new data arrives or an overrun error will occur. 9.7 iceport pin buffers and drivers the choice of iceport external pin buffers and drivers will vary with each design. however, this section provides a few general recommendations for any design using an iceport. please note that the pin reserved for iceclkp may be conserved if the iceport clock is internally derived from sclkp, as described in section 9.6.5, clock domains and properties. the buffer for input pin icerxp should be a 5-v-Ccompatible schmitt trigger with an internal pull-up resistor, since the incoming signal may be noisy and driven from a 5-v source. an internal pull-up resistor is recommended so that icerxp can be left unconnected if the iceport is unused. the driver for the icetxp output pin should be a 4-ma driver, with a reduced slew rate to avoid re?ections.
9-22 iceport
10-1 chapter 10 speci?cations this chapter speci?es the physical and electrical characteristics of the CW4011 core. it contains the following sections: section 10.1, physical speci?cations section 10.2, ac timing and loading 10.1 physical speci?cations the CW4011 has a single 1x clock input. clock duty cycle may vary from 40 to 60% at maximum frequency. the CW4011 operates at 90 mhz for worst case process, 3.14 v, 110 ?c at junction. the CW4011 dissipates approximately 7.0 mw/mhz. table 10.1 lists the dimensions of the CW4011 core in g10 technology. 10.2 ac timing and loading the input setup time is de?ned from the signal valid to the rising edge of sclkp and the input hold time is de?ned from the rising edge of sclkp to the signal valid. for input setup times, the driver must drive the signal valid before any receivers need it. for input hold times, the driver must hold the signal valid longer than needed by any receiver. the output maximum and minimum delay times are de?ned from the rising edge of sclkp to the signal valid. table 10.1 CW4011 physical layout size core technology width height total area CW4011 g10-p 2.5 mm 3.5 mm 8.75 mm 2
10-2 speci?cations load is the total load on the net in standard loads visible to the output driver. the loading values are for internal loading only. the load column shows the internal loading on each net in the module. table 10.2 shows the timing conditions. figure 10.1 shows how the ac timing is de?ned. table 10.3 and table 10.4 list the ac timing values and the loading for the CW4011. the timing is from motive static timing analysis. note: the conditions used in this timing analysis were chosen to be representative of a typical CW4011 design and are intended as a guide to designers. however, the numbers obtained for individual designs may vary since loading depends on chip placement and routing. if the loading exceeds the values given on the previous page, the timing values may exceed those listed here. figure 10.1 ac speci?cations table 10.2 CW4011 timing considerations ac timing process v dd (volts) junction temperature (?c) clock period (ns) bccom 0.874 3.46 0 11.0 wc110 1.38 3.14 110 11.0 sclkp output signal max delay min delay clock period input signal setup hold
ac timing and loading 10-3 table 10.3 CW4011 input ac timing and loading signal name bccom wc110 standard loads (pf) setup (ns) hold (ns) setup (ns) hold (ns) bendn 1 0.75 cinvap[31:5] - 0.23 0.59 - 0.39 1.12 0.75 cpbusyn[3:1] 3.03 - 0.56 5.72 - 1.05 0.75 cpcondp[3:0] 2.81 - 0.54 5.14 - 1.04 0.75 cpfrcdp[31:0] 1.30 - 0.05 2.51 - 0.02 0.75 cpsreqn[3:1] 2.86 - 0.79 5.13 - 1.41 0.75 cresetn 4.11 - 0.31 7.11 -0.55 0.75 dcinvsn 0.11 0.24 0.25 0.45 0.75 exintn[5:0] 0.22 0.10 0.37 0.19 0.75 exvap[31:2] 1.82 - 0.39 3.65 - 0.90 0.75 exvintn 0.16 - 0.05 0.28 - 0.10 0.75 fpeoddn - 0.07 0.17 - 0.11 0.25 0.75 fperrxn 0.06 0 0.18 - 0.02 0.75 icinvsn 0.11 0.24 0.25 0.45 0.75 implop[3:0] 1.58 - 0.87 3.19 - 1.80 0.75 nmin 0.34 0.21 0.65 0.41 0.75 ocacceptp 1.98 - 0.13 3.75 - 0.26 0.75 revlop[3:0] 1.64 - 0.70 3.32 - 1.46 0.75 scb32n 2.66 - 0.16 4.80 - 0.22 0.75 scberrn 2.71 - 0.29 4.91 - 0.55 0.75 scbpwan 0.64 - 0.34 1.22 - 0.69 0.75 scbrdyn 2.66 - 0.36 4.80 - 0.64 0.75 (sheet 1 of 2)
10-4 speci?cations scbrtyn 2.60 - 0.33 4.71 - 0.55 0.75 scdip[63:0] 0.28 0.33 0.62 0.60 0.75 schrqn 3.01 -0.37 5.0.75 - 0.70 0.75 sctsen 2.70 - 0.27 4.51 - 0.56 0.75 testmp 1 0.75 wresetn 4.08 - 0.11 7.06 - 0.25 0.75 1. bendn and testmp are strapped input signals and do not change state. table 10.4 CW4011 output ac timing and loading signal name bccom wc110 standard loads (pf) min (ns) max (ns) min (ns) max (ns) accsizep[1:0] 0.87 0.89 1.64 1.68 0.75 accstorep 0.94 0.98 1.76 1.86 0.75 brlikfn 0.82 2.74 1.57 5.17 0.75 cpcodep[31:0] 1.16 3.70 2.28 6.74 0.75 cpfixupn 1.06 1.62 1.96 3.07 0.75 cpfrcen 1.57 4.27 2.93 7.75 0.75 cpmissn 1.66 4.11 2.98 7.50 0.75 cprstn[3:1] 0.78 0.79 1.46 1.49 0.75 cptocdp[31:0] 0.94 3.53 1.74 6.87 0.75 cptocen 0.98 3.88 1.89 7.13 0.75 cpxoddn 1.72 2.85 3.24 5.24 0.75 (sheet 1 of 3) table 10.3 CW4011 input ac timing and loading (cont.) signal name bccom wc110 standard loads (pf) setup (ns) hold (ns) setup (ns) hold (ns) (sheet 2 of 2)
ac timing and loading 10-5 cpxstbn[3:0] 1.31 4.81 2.41 8.69 0.75 crvalidp 1.03 3.38 1.93 6.12 0.75 dvaddrp[31:0] 1.16 2.92 2.07 5.55 0.75 exloadp 0.85 0.85 1.60 1.61 0.75 exvapen 0.84 0.85 1.60 1.61 0.75 pcancrn 1.32 2.88 2.43 5.14 0.75 pcanoddn 1.08 3.19 1.92 5.75 0.75 pstalln 1.46 3.76 2.65 6.93 0.75 scanreqp 0.94 3.64 1.72 6.76 0.75 scaoen 0.66 0.76 1.34 1.39 0.75 scaop[31:0] 0.70 0.82 1.39 1.60 0.75 scbgep 0.72 0.79 1.31 1.48 0.75 scdoen 0.80 0.96 1.59 1.70 0.75 scdop[63:0] 0.71 0.80 1.39 1.57 0.75 schgtn 0.80 0.85 1.53 1.59 0.75 scifetn 0.57 0.69 1.14 1.41 0.75 sclockn 0.58 0.64 1.17 1.31 0.75 sctben[7:0] 0.62 0.74 1.22 1.49 0.75 table 10.4 CW4011 output ac timing and loading (cont.) signal name bccom wc110 standard loads (pf) min (ns) max (ns) min (ns) max (ns) (sheet 2 of 3)
10-6 speci?cations sctbln 1.04 1.23 2.00 2.22 0.75 sctbstn 0.57 0.69 1.14 1.41 0.75 sctpwn 0.57 0.69 1.14 1.41 0.75 sctrqn 1.23 3.12 2.31 5.85 0.75 sctssn 0.57 0.69 1.14 1.41 0.75 suspexn 1.03 1.68 1.91 3.20 0.75 table 10.4 CW4011 output ac timing and loading (cont.) signal name bccom wc110 standard loads (pf) min (ns) max (ns) min (ns) max (ns) (sheet 3 of 3)
a-1 appendix a CW4011 register summary this appendix contains a quick description of all the CW4011 core reg- isters and a listing of all CW4011-speci?c registers. this appendix is divided into two sections: section a.1, CW4011 cpu registers section a.2, register summary a.1 CW4011 cpu registers figure a.1 shows the CW4011 cpu registers. there are 32 general reg- isters, each consisting of a single word (32 bits). the 32 general registers are treated symmetrically with two exceptions: r0 is hardwired to a zero value and r31 is de?ned as the link register for jump and link instructions. figure a.1 CW4011 cpu registers 31 general purpose registers 0 r0 r1 r2 r29 r30 r31 31 multiply/divide registers 0 31 program counter 0 pc hi lo
a-2 CW4011 register summary register r0 may be speci?ed as a target register for any instruction when the result of the operation is discarded. the register maintains a value of zero under most conditions when used as a source register. the two multiply/divide registers (hi, lo) store the doubleword, 64-bit result of multiply and divide operations. a.2 register summary table a.1 lists the CW4011-speci?c registers, their location by either cp0 number or physical address, and the page number where each is described. all the registers listed in this section are separate from the general cpu registers listed in section a.1, CW4011 cpu registers. table a.1 cp0 exception processing registers register name cp0 register number physical address reference page context 4 4-5 debug control and status (dcs) 7 4-7 bad virtual address (badvaddr) 8 4-9 count 9 4-9 compare 11 4-9 status 12 4-10 cause 13 4-18 exception program counter (epc) 14 4-20 processor revision identi?er (prid) 15 4-20 con?guration and cache control (ccc) 16 4-22 load linked address (lladr) 17 4-26 breakpoint program counter (bpc) 18 4-27 (sheet 1 of 2)
register summary a-3 breakpoint data address (bda) 19 4-27 breakpoint pc mask (bpcm) 20 4-27 breakpoint data address mask (bdam) 21 4-28 rotate 23 4-28 circular mask (cmask) 24 4-29 error exception program counter (error epc) 30 4-30 entryhi 10 5-10 entrylo 2 5-11 pagemask 5 5-12 index 0 5-13 random 1 5-13 wired 6 5-14 rx status 0x10ff0000 9-10 rx setup 0x10ff0000 9-11 rx data 0x10ff0004 9-11 tx status 0x10ff0008 9-12 tx data 0x10ff000c 9-12 table a.1 cp0 exception processing registers (cont.) register name cp0 register number physical address reference page (sheet 2 of 2)
a-4 CW4011 register summary
b-1 appendix b cache sizing and design concerns this appendix describes the i-cache and d-cache sizing and design considerations in a CW4011 system and is split into the following sections: section b.1, CW4011 i-cache con?gurations section b.2, CW4011 i-cache interface section b.3, i-cache shell section b.4, i-cache set associative ram hookup section b.5, i-cache direct-mapped ram section b.6, CW4011 d-cache con?gurations section b.7, CW4011 d-cache interface section b.8, d-cache shell section b.9, d-cache set associative ram hookup section b.10, d-cache direct-mapped ram hookup
b-2 cache sizing and design concerns b.1 CW4011 i-cache con?gurations the CW4011 supports a number of two-way set associative and direct-mapped i-cache con?gurations, as shown in ta b l e b . 1 . different physical i-cache con?gurations require different sizes for tag ram and data ram, as well as different CW4011 core signal connections, as described in section b.4, i-cache set associative ram hookup. however, usually a larger i-cache design can emulate any of the smaller available con?gurations through the is[1:0] bits in the ccc register. for example, a 16-kbyte i-cache can be con?gured to emulate all other possible con?gurations if the tag memory has enough bit tag width for small con?gurations. to test a 1-kbyte con?guration, the i-cache tag would need 23 bits. this i-cache size ?exibility allows bench marking of various con?gurations in the simulation model and on the reference device chip. for more information on dynamic cache sizing, see section 4.3.10, con?guration and cache control (ccc) register. the following sections describe the physical port interconnects between the CW4011 and the synchronous ram models. this connection is performed in verilog or vhdl at the shell level. in the CW4011 deliverables database, verilog and vhdl are available as an example connection. table b.1 CW4011 i-cache sizes two-way set associative i-cache (kbytes) direct-mapped i-cache (kbytes) ccc register is[1:0] settings 2100 4201 8410 16 8 11
CW4011 i-cache interface b-3 b.2 CW4011 i-cache interface the CW4011 i-cache interface was designed to implement a two-way set associative i-cache of varying size. the i-cache uses a least recently used (lru) algorithm to determine which set should be replaced when both cache lines are valid in the case of two-way set associative con?guration. logically, 256x1 memory is necessary. since hdrams are not available in formats less than four bits wide, 64x4 memory is used as the lru memory for 16-kbyte systems. for a list of the i-cache ram interface signals, see section 7.8, instruction cache interface. b.3 i-cache shell in the CW4011 rtl shell model, the deliverables data, tag, and data memories are gathered in a shell module for the i-cache. the hierarchy is shown in figure b.1 . there are several glue logic gates in the icache_shell. figure b.1 CW4011 i-cache shell rtl tag ram tag ram data ram data ram tag ram tag ram lru ram data ram data ram CW4011_core dcache_shell icache_shell CW4011_shell
b-4 cache sizing and design concerns b.4 i-cache set associative ram hookup i-cache set associative implementations require: two tag rams four data rams one lru ram ta b l e b . 2 lists the required ram size. the rams used should be word write enabled, synchronous rams such as the m10p111hs for the CW4011 (lcbg10p). the following sections describe the connections from the CW4011 core i-cache interface ports to the i-cache ram macros ports, assuming m10p111hs rams are implemented. unspeci?ed i/o can be considered unconnected. unconnected core inputs should be tied deasserted. unconnected core outputs may be left open. for bigger cache sizes, the CW4011 core needs fewer tag inputs. unused tag inputs are ignored according to the programming of the ccc register. if both or either cache set is used as a ram where the address is ?xed in the memory address space permanently, the tag memory is not necessary. the tag inputs must be tied either low or high, according to the speci?c design address mapping. table b.2 set associative, i-cache ram requirements set associative i-cache size (kbytes) tag ram data ram lru ram 1 quantity size (bits) quantity size (bits) quantity size (bits) 2 2 32x23 4 128x32 1 32x1 4 2 64x22 4 256x32 1 64x1 8 2 128x21 4 512x32 1 128x1 16 2 256x20 4 1024x32 1 256x1 1. listed are the logic needs for the lru ram. since the minimum size for an hdram is 64x4, one 64x4 hdram can be used for any of the i-cache size con?gurations. the deliverables database has an hdl model named fake256x1, which renames the names of signals of a 64x4 ram to those of a 256x1.
i-cache set associative ram hookup b-5 b.4.1 2-kbyte i-cache set associative connections the following connections between the CW4011 core and the synchronous ram modules need to be made. connections for lru memory fake256x1 are not shown. refer the hdl model of the fake256x1 for more information. b.4.1.1 tag ram set 0 do[22:0] ? ic0tagdop[22:0] di[22:0] ? ictagdip[22:0] a[4:0] ? itaddrp[4:0] (open) ? itaddrp[7:5] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.4.1.2 tag ram set 1 do[22:0] ? ic1tagdop[22:0] di[22:0] ? ictagdip[22:0] a[4:0] ? itaddrp[4:0] (open) ? itaddrp[7:5] clk ? shell clock oe ? ictagrdp we ? itwe1p (note: generated by icache_shell glue logic) enable ? high b.4.1.3 data ram set 0 high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high
b-6 cache sizing and design concerns b.4.1.4 data ram set 0 low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high b.4.1.5 data ram set 1 high word do[31:0] ? ic1datadohp[31:0] di[31:0] ? icdatadip[63:32] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe1hp (note: generated by icache_shell glue logic) enable ? high b.4.1.6 data ram set 1 low word do[31:0] ? ic1datadolp[31:0] di[31:0] ? icdatadip[31:0] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe1lp (note: generated by icache_shell glue logic) enable ? high
i-cache set associative ram hookup b-7 b.4.2 4-kbyte i-cache set associative connections the following connections between the CW4011 core and the synchronous ram modules need to be made. connections for lru memory fake256x1 are not shown. b.4.2.1 tag ram set 0 do[21:0] ? ic0tagdop[22:1] 1b0 ? ic0tagdop[1] di[21:0] ? ictagdip[22:1] (open) ? ictagdip[1] a[5:0] ? itaddrp[5:0] (open) ? itaddrp[7:6] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.4.2.2 tag ram set 1 do[21:0] ? ic1tagdop[22:1] 1b0 ? ic1tagdop[1] di[21:0] ? ictagdip[22:1] (open) ? ictagdip[1] a[5:0] ? itaddrp[5:0] (open) ? itaddrp[7:6] clk ? shell clock oe ? ictagrdp we ? itwe1p (note: generated by icache_shell glue logic) enable ? high b.4.2.3 data ram set 0 high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp
b-8 cache sizing and design concerns we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.4.2.4 data ram set 0 low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high b.4.2.5 data ram set 1 high word do[31:0] ? ic1datadohp[31:0] di[31:0] ? icdatadip[63:32] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp we ? icwe1hp (note: generated by icache_shell glue logic) enable ? high b.4.2.6 data ram set 1 low word do[31:0] ? ic1datadolp[31:0] di[31:0] ? icdatadip[31:0] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp we ? icwe1lp (note: generated by icache_shell glue logic) enable ? high
i-cache set associative ram hookup b-9 b.4.3 8-kbyte, set associative cache, hookup the following connections between the CW4011 core and the synchronous ram modules need to be made. connections for lru memory fake256x1 are not shown. b.4.3.1 tag ram set 0 do[20:0] ? ic0tagdop[22:2] 2b0 ? ic0tagdop[2:1] di[20:0] ? ictagdip[22:2] (open) ? ictagdip[2:1] a[6:0] ? itaddrp[6:0] (open) ? itaddrp[7] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.4.3.2 tag ram set 1 do[20:0] ? ic1tagdop[22:2] 2b0 ? ic1tagdop[2:1] di[20:0] ? ictagdip[22:2] (open) ? ictagdip[2:1] a[6:0] ? itaddrp[6:0] (open) ? itaddrp[7] clk ? shell clock oe ? ictagrdp we ? itwe1p (note: generated by icache_shell glue logic) enable ? high b.4.3.3 data ram set 0 high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp
b-10 cache sizing and design concerns we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.4.3.4 data ram set 0 low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high b.4.3.5 data ram set 1 high word do[31:0] ? ic1datadohp[31:0] di[31:0] ? icdatadip[63:32] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe1hp (note: generated by icache_shell glue logic) enable ? high b.4.3.6 data ram set 1 low word do[31:0] ? ic1datadolp[31:0] di[31:0] ? icdatadip[31:0] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe1lp (note: generated by icache_shell glue logic) enable ? high
i-cache set associative ram hookup b-11 b.4.4 16-kbyte i-cache set associative connections the following connections between the CW4011 core and the synchronous ram modules need to be made. connections for lru memory fake256x1 are not shown. refer the hdl model of the fake256x1 for more information. b.4.4.1 tag ram set 0 do[19:0] ? ic0tagdop[22:3] 3b0 ? ic0tagdop[3:1] di[19:0] ? ictagdip[22:3] (open) ? ictagdip[3:1] a[7:0] ? itaddrp[7:0] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.4.4.2 tag ram set 1 do[19:0] ? ic1tagdop[22:3] 3b0 ? ic1tagdop[3:1] di[19:0] ? ictagdip[22:3] (open) ? ictagdip[3:1] a[7:0] ? itaddrp[7:0] clk ? shell clock oe ? ictagrdp we ? itwe1p (note: generated by icache_shell glue logic) enable ? high b.4.4.3 data ram set 0 high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high
b-12 cache sizing and design concerns b.4.4.4 data ram set 0 low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high b.4.4.5 data ram set 1 high word do[31:0] ? ic1datadohp[31:0] di[31:0] ? icdatadip[63:32] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe1hp (note: generated by icache_shell glue logic) enable ? high b.4.4.6 data ram set 1 low word do[31:0] ? ic1datadolp[31:0] di[31:0] ? icdatadip[31:0] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe1lp (note: generated by icache_shell glue logic) enable ? high
i-cache direct-mapped ram b-13 b.5 i-cache direct-mapped ram direct-mapped i-cache implementations require one tag rams and two data rams. ta b l e b . 3 lists the required ram sizes. the rams used should be word write enabled, synchronous rams such as the m10p111hs for the CW4011 (lcbg10p). in the case of direct-mapped cache con?guration, inputs of one set are not used. all unused inputs are ignored internally, but they should be always be tied deasserted. the following sections describe the connections from the CW4011 core i-cache interface ports to the i-cache ram macros ports, assuming m10p111hs rams. unspeci?ed i/o can be considered unconnected. of course, unconnected core inputs should be tied deasserted. table b.3 direct-mapped, writeback, i-cache ram requirements direct-mapped i-cache size (kbytes) tag ram data ram quantity size (bits) quantity size (bits) 1 1 32x23 2 128x32 2 1 64x22 2 256x32 4 1 128x21 2 512x32 8 1 256x20 2 1024x32
b-14 cache sizing and design concerns b.5.1 1-kbyte i-cache direct-mapped connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.5.1.1 tag ram do[22:0] ? ic0tagdop[22:0] di[22:0] ? ictagdip[22:0] a[4:0] ? itaddrp[4:0] (open) ? itaddrp[7:5] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.5.1.2 data ram high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.5.1.3 data ram low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[6:0] ? icaddrp[6:0] (open) ? icaddrp[9:7] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high
i-cache direct-mapped ram b-15 b.5.2 2-kbyte i-cache direct-mapped connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.5.2.1 tag ram do[21:0] ? ic0tagdop[22:1] 1b0 ? ic0tagdop[1] di[21:0] ? ictagdip[22:1] (open) ? ictagdip[1] a[5:0] ? itaddrp[5:0] (open) ? itaddrp[7:6] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.5.2.2 data ram high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.5.2.3 data ram low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[7:0] ? icaddrp[7:0] (open) ? icaddrp[9:8] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high
b-16 cache sizing and design concerns b.5.3 4-kbyte i-cache direct-mapped connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.5.3.1 tag ram do[20:0] ? ic0tagdop[22:2] 2b0 ? ic0tagdop[2:1] di[20:0] ? ictagdip[22:2] (open) ? ictagdip[2:1] a[6:0] ? itaddrp[6:0] (open) ? itaddrp[7] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.5.3.2 data ram high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.5.3.3 data ram low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high
i-cache direct-mapped ram b-17 b.5.4 8-kbyte i-cache direct-mapped connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.5.4.1 tag ram do[19:0] ? ic0tagdop[22:3] 3b0 ? ic0tagdop[3:1] di[19:0] ? ictagdip[22:3] (open) ? ictagdip[3:1] a[7:0] ? itaddrp[7:0] clk ? shell clock oe ? ictagrdp we ? itwe0p (note: generated by icache_shell glue logic) enable ? high b.5.4.2 data ram high word do[31:0] ? ic0datadohp[31:0] di[31:0] ? icdatadip[63:32] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe0hp (note: generated by icache_shell glue logic) enable ? high b.5.4.3 data ram low word do[31:0] ? ic0datadolp[31:0] di[31:0] ? icdatadip[31:0] a[9:0] ? icaddrp[9:0] clk ? shell clock oe ? icdatardp we ? icwe0lp (note: generated by icache_shell glue logic) enable ? high
b-18 cache sizing and design concerns b.5.5 instruction ram i-cache set 1 can be used as a scratchpad ram. the i-cache scratch- pad ram is enabled 7by setting the isr1 bit in the ccc register to one. if the address space is ?xed permanently, the tag memory for i-cache set 1 is not necessary. the tag inputs must be tied low or high, according to a designs address mapping. if the address space should be programmable, the tag memory must be initialized before i-cache set 1 is used as an instruction ram. for both cases, instruction codes for the instruction ram must be written to the instruction data ram of set 1 by a cache maintenance function, which is enabled by isolate cache (isc) and tag bits of the ccc register. for more information, see section 4.3.10, con?guration and cache control (ccc) register. the following is an example of a 4-kbyte instruction ram con?guration without a tag. b.5.5.1 tag ram set 1 do[31:12] ? ic0tagdop[21:2] 1b1 ? ic0tagdop[22] 2b00 ? ic0tagdop[1:0] (open) ? ictagdip[22:0] (open) ? itaddrp[7:0] (open) ? ictagrdp (open) ? itwe0p (note: generated by icache_shell glue logic) clk ? shell clock enable ? high b.5.5.2 data ram high word do[31:0] ? ic1datadohp[31:0] di[31:0] ? icdatadip[63:32] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp
CW4011 d-cache con?gurations b-19 we ? icwe1hp (note: generated by icache_shell glue logic) enable ? high b.5.5.3 data ram low word do[31:0] ? ic1datadolp[31:0] di[31:0] ? icdatadip[31:0] a[8:0] ? icaddrp[8:0] (open) ? icaddrp[9] clk ? shell clock oe ? icdatardp we ? icwe1lp (note: generated by icache_shell glue logic) enable ? high b.6 CW4011 d-cache con?gurations the CW4011 supports a number of two-way set associative and direct-mapped d-cache con?gurations, as shown in ta b l e b . 4 . different physical d-cache con?gurations require different sizes for tag ram and data ram, as well as different CW4011 core signal connections, as described in section b.9, d-cache set associative ram hookup. however, usually a larger d-cache design can emulate any of the smaller available con?gurations through the ds[1:0] bits in the ccc register. for example, a 16-kbyte d-cache can be con?gured to emulate all other possible con?gurations if the tag memory has enough bit tag width for table b.4 CW4011 d-cache sizes two-way set associative d-cache (kbytes) direct-mapped d-cache (kbytes) ccc register ds[1:0] settings 2100 4201 8410 16 8 11
b-20 cache sizing and design concerns small con?gurations. to test a 1-kbyte con?guration, the d-cache tag would need 24 bits. this d-cache size ?exibility allows bench marking of various con?gurations in the simulation model and on the reference device chip. for more information on dynamic cache sizing, see section 4.3.10, con?guration and cache control (ccc) register. the following sections describe the physical port interconnect between the CW4011 and the synchronous ram models. this connection is in verilog or vhdl at the shell level. in the CW4011 deliverables database, verilog and vhdl models are available as an example. b.7 CW4011 d-cache interface the CW4011 d-cache interface was designed to implement a two-way set associative d-cache of varying size. in the CW4011, the data associativities (set 0 and set 1) are interleaved across two data rams (bank a and bank b). this allows 64 bit reads and writes to the scbus on cache re?lls, while only requiring two banks of 32-bit wide rams. ta b l e b . 5 shows this information interleaving. table b.5 d-cache data interleaving bank a bank b line 0, set 0, word 0 line 0, set 1 word 0 line 0, set 1, word 1 line 0, set 0 word 1 line 0, set 0, word 2 line 0, set 1 word 2 line 0, set 1, word 3 line 0, set 0 word 3 line 0, set 0, word 4 line 0, set 1 word 4 line 0, set 1, word 5 line 0, set 0 word 5 line 0, set 0, word 6 line 0, set 1 word 6 line 0, set 1, word 7 line 0, set 0 word 7 line 1, set 0, word 0 line 1, set 1 word 0 line 1, set 1, word 1 line 1, set 0 word 1 line 1, set 0, word 2 line 1, set 1 word 2 line 1, set 1, word 3 line 1, set 0 word 3 line 1, set 0, word 4 line 1, set 1 word 4 (sheet 1 of 2)
d-cache shell b-21 for more detailed information about the CW4011 d-cache interface signals, see section 7.7, data cache interface. b.8 d-cache shell in the CW4011 rtl shell model, the deliverables data, tag, and data memories are gathered in a shell module for d-cache. figure b.2 shows the CW4011_shell hierarchy. figure b.2 CW4011 d-cache shell rtl line 1, set 1, word 5 line 1, set 0 word 5 line 1, set 0, word 6 line 1, set 1 word 6 line 1, set 1, word 7 line 1, set 0 word 7 . . . . . . line n, set 0, word 4 line n, set 1 word 4 line n, set 1, word 5 line n, set 0 word 5 line n, set 0, word 6 line n, set 1 word 6 line n, set 1, word 7 line n, set 0 word 7 table b.5 d-cache data interleaving (cont.) bank a bank b (sheet 2 of 2) tag ram tag ram data ram data ram tag ram tag ram lru ram data ram data ram CW4011_core dcache_shell icache_shell CW4011_shell
b-22 cache sizing and design concerns b.9 d-cache set associative ram hookup set associative implementations require two tag rams and two data rams. ta b l e b . 6 and ta b l e b . 7 list the required ram sizes. the rams used should be word write enabled, synchronous rams such the m10p111hs for the CW4011 (lcbg10p). the following sections describe the connections from the CW4011 core d-cache interface ports to the d-cache ram macros ports, assuming m10p111hs rams. unspeci?ed i/o can be considered unconnected. of course, unconnected core inputs should be tied deasserted. unconnected core outputs may be left open. for bigger cache size, the CW4011 core needs fewer tag inputs. unused tag inputs are ignored, according to the programming of the ccc register. if both or either cache set is used as a ram of which address is ?xed in the memory address space permanently, the tag memory is not table b.6 set associative, writeback, d-cache ram requirements set associative i-cache size (kbytes) tag ram data ram quantity size (bits) quantity size (bits) 2 2 32x24 2 256x32 4 2 64x23 2 512x32 8 2 128x22 2 1024x32 16 2 256x21 2 2048x32 table b.7 set associative, writethrough, d-cache ram requirements set associative i-cache size (kbytes) tag ram data ram quantity size (bits) quantity size (bits) 2 2 32x23 2 256x32 4 2 64x22 2 512x32 8 2 128x21 2 1024x32 16 2 256x20 2 2048x32
d-cache set associative ram hookup b-23 necessary. the tag inputs must be tied either low or high, according to the address mapping. refer the hdl model of the cache shell for details about the connections to memory macros. b.9.1 2-kbyte d-cache set associative, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.1.1 tag ram set 0 do[23:0] ? dc0tagdop[23:0] di[23:0] ? dc0tagdip[23:0] a[4:0] ? dctagaddrp[9:5]] (open) ? dctagaddrp[12:10] clk ? shell clock oe[23:0] ? high we[23:0] ? {23{dtweap[1]},dtweap[0]} enable ? high b.9.1.2 tag ram set 1 do[23:0] ? dc1tagdop[23:0] di[23:0] ? dc1tagdip[23:0] a[4:0] ? dctagaddrp[9:5] (open) ? dctagaddrp[12:10] clk ? shell clock oe[23:0] ? high we[23:0] ? {23{dtwebp[1]},dtwebp[0]} enable ? high
b-24 cache sizing and design concerns b.9.1.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[7:0] ? {dcdataaddrp[9:3],dcadataaddrp} (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.1.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[7:0] ? {dcdataaddrp[9:3],dcbdataaddrp} (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.9.2 4-kbyte d-cache set associative, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.2.1 tag ram set 0 do[22:0] ? {dc0tagdop[23:3],dc0tagdop[1:0]} 1b0 ? dc0tagdop[2] di[22:0] ? {dc0tagdip[23:3],dc0tagdip[1:0]} (open) ? dc0tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[22:0] ? high we[22:0] ? {22{dtweap[1]},dtweap[0]} enable ? high
d-cache set associative ram hookup b-25 b.9.2.2 tag ram set 1 do[22:0] ? {dc1tagdop[23:3],dc1tagdop[1:0]} 1b0 ? dc1tagdop[2] di[22:0] ? {dc1tagdip[23:3],dc1tagdip[1:0]} (open) ? dc0tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[22:0] ? high we[22:0] ? {22{dtwebp[1]},dtwebp[0]} enable ? high b.9.2.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[8:0] ? {dcdataaddrp[10:3],dcadataaddrp} (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.2.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[8:0] ? {dcdataaddrp[10:3],dcbdataaddrp} (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
b-26 cache sizing and design concerns b.9.3 8-kbyte d-cache set associative, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.3.1 tag ram set 0 do[21:0] ? {dc0tagdop[23:4],dc0tagdop[1:0]} 2b0 ? dc0tagdop[3:2] di[21:0] ? {dc0tagdip[23:4],dc0tagdip[1:0]} (open) ? dc0tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[21:0] ? high we[21:0] ? {21{dtweap[1]},dtweap[0]} enable ? high b.9.3.2 tag ram set 1 do[21:0] ? {dc1tagdop[23:4],dc1tagdop[1:0]} 2b0 ? dc1tagdop[3:2] di[21:0] ? {dc1tagdip[23:4],dc1tagdip[1:0]} (open) ? dc1tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[21:0] ? high we[21:0] ? {21{dtwebp[1]},dtwebp[0]} enable ? high b.9.3.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[9:0] ? {dcdataaddrp[11:3],dcadataaddrp} (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ?
d-cache set associative ram hookup b-27 {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.3.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[9:0] ? {dcdataaddrp[11:3],dcbdataaddrp} (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.9.4 16-kbyte d-cache set associative, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.4.1 tag ram set 0 do[20:0] ? {dc0tagdop[23:5],dc0tagdop[1:0]} 3b0 ? dc0tagdop[4:2] di[20:0] ? {dc0tagdip[23:5],dc0tagdip[1:0]} (open) ? dc0tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[20:0] ? high we[20:0] ? {20{dtweap[1]},dtweap[0]} enable ? high b.9.4.2 tag ram set 1 do[20:0] ? {dc1tagdop[23:5],dc1tagdop[1:0]} 3b0 ? dc1tagdop[4:2] di[20:0] ? {dc1tagdip[23:5],dc1tagdip[1:0]} (open) ? dc1tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[20:0] ? high
b-28 cache sizing and design concerns we[20:0] ? {20{dtwebp[1]},dtwebp[0]} enable ? high b.9.4.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[10:0] ? {dcdataaddrp[12:3],dcadataaddrp} clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.4.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[10:0] ? {dcdataaddrp[12:3],dcbdataaddrp} clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.9.5 2-kbyte d-cache set associative, writethrough connections for write through cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.5.1 tag ram set 0 do[22:0] ? {dc0tagdop[23:1]} 1b0 ? dc0tagdop[0] di[22:0] ? dc0tagdip[23:1] (open) ? dc0tagdip[0] a[4:0] ? dctagaddrp[9:5] (open) ? dctagaddrp[12:10] clk ? shell clock
d-cache set associative ram hookup b-29 oe[22:0] ? high we[22:0] ? 23{dtweap[1]} enable ? high b.9.5.2 tag ram set 1 do[22:0] ? {dc1tagdop[23:1]} 1b0 ? dc1tagdop[0] di[22:0] ? dc1tagdip[23:1] (open) ? dc1tagdip[0] a[4:0] ? dctagaddrp[9:5] (open) ? dctagaddrp[12:10] clk ? shell clock oe[22:0] ? high we[22:0] ? 23{dtwebp[1]} enable ? high b.9.5.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[7:0] ? {dcdataaddrp[9:3],dcadataaddrp} (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.5.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[7:0] ? {dcdataaddrp[9:3],dcbdataaddrp} (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
b-30 cache sizing and design concerns b.9.6 4-kbyte d-cache set associative, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.6.1 tag ram set 0 do[21:0] ? {dc0tagdop[23:3],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 1b0 ? dc0tagdop[2] di[21:0] ? {dc0tagdip[23:3],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[21:0] ? high we[21:0] ? 22{dtweap[1]} enable ? high b.9.6.2 tag ram set 1 do[21:0] ? {dc1tagdop[23:3],dc1tagdop[1]} 1b0 ? dc1tagdop[0] 1b0 ? dc1tagdop[2] di[21:0] ? {dc1tagdip[23:3],dc1tagdip[1]} (open) ? dc1tagdip[0] (open) ? dc1tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[21:0] ? high we[21:0] ? 22{dtwebp[1]} enable ? high
d-cache set associative ram hookup b-31 b.9.6.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[8:0] ? {dcdataaddrp[10:3],dcadataaddrp} (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.6.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[8:0] ? {dcdataaddrp[10:3],dcbdataaddrp} (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.9.7 8-kbyte d-cache set associative, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made.
b-32 cache sizing and design concerns b.9.7.1 tag ram set 0 do[20:0] ? {dc0tagdop[23:4],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 2b0 ? dc0tagdop[3:2] di[20:0] ? {dc0tagdip[23:4],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[20:0] ? high we[20:0] ? 21{dtweap[1]} enable ? high b.9.7.2 tag ram set 1 do[20:0] ? {dc1tagdop[23:4],dc1tagdop[1]} 1b0 ? dc1tagdop[0] 2b0 ? dc1tagdop[3:2] di[20:0] ? {dc1tagdip[23:4],dc1tagdip[1]} (open) ? dc1tagdip[0] (open) ? dc1tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[20:0] ? high we[20:0] ? 21{dtwebp[1]} enable ? high b.9.7.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[9:0] ? {dcdataaddrp[11:3],dcadataaddrp} (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high
d-cache set associative ram hookup b-33 b.9.7.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[9:0] ? {dcdataaddrp[11:3],dcbdataaddrp} (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.9.8 16-kbyte d-cache set associative, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.9.8.1 tag ram set 0 do[19:0] ? {dc0tagdop[23:5],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 3b0 ? dc0tagdop[4:2] di[19:0] ? {dc0tagdip[23:5],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[19:0] ? high we[19:0] ? 20{dtweap[1]} enable ? high
b-34 cache sizing and design concerns b.9.8.2 tag ram set 1 do[19:0] ? {dc1tagdop[23:5],dc1tagdop[1]} 1b0 ? dc1tagdop[0] 3b0 ? dc1tagdop[4:2] di[19:0] ? {dc1tagdip[23:5],dc1tagdip[1]} (open) ? dc1tagdip[0] (open) ? dc1tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[19:0] ? high we[19:0] ? 20{dtwebp[1]} enable ? high b.9.8.3 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[10:0] ? {dcdataaddrp[12:3],dcadataaddrp} clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.9.8.4 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[10:0] ? {dcdataaddrp[12:3],dcbdataaddrp} clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
d-cache direct-mapped ram hookup b-35 b.10 d-cache direct-mapped ram hookup direct-mapped d-cache implementations require one tag ram and two data rams. ta b l e b . 8 and ta b l e b . 9 list the required ram sizes. the rams used should be word write enabled, synchronous rams such as the m10p111hs for the CW4011 (lcbg10p). in the case of direct-mapped cache con?guration, inputs of one set are not used. all unused inputs are ignored internally, but they should be tied deasserted. the following sections describe connections from the CW4011 core d-cache interface ports to the d-cache ram macros ports assuming m10p111hs rams. unspeci?ed i/o can be considered unconnected. of course, unconnected core inputs should be tied deasserted. table b.8 direct-mapped, writeback, d-cache ram requirements direct-mapped i-cache size (kbytes) tag ram data ram quantity size (bits) quantity size (bits) 1 1 32x24 2 128x32 2 1 64x23 2 256x32 4 1 128x22 2 512x32 8 1 256x21 2 1024x32 table b.9 direct-mapped, writethrough, d-cache ram requirements direct-mapped i-cache size (kbytes) tag ram data ram quantity size (bits) quantity size (bits) 1 1 32x23 2 128x32 2 1 64x22 2 256x32 4 1 128x21 2 512x32 8 1 256x20 2 1024x32
b-36 cache sizing and design concerns b.10.1 1-kbyte d-cache direct-mapped, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.1.1 tag ram set 0 do[23:0] ? dc0tagdop[23:0] di[23:0] ? dc0tagdip[23:0] a[4:0] ? dctagaddrp[9:5] (open) ? dctagaddrp[12:10] clk ? shell clock oe[23:0] ? high we[23:0] ? {23{dtweap[1]},dtweap[0]} enable ? high b.10.1.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[6:0] ? dcdataaddrp[9:3] (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.1.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[6:0] ? dcdataaddrp[9:3] (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
d-cache direct-mapped ram hookup b-37 b.10.2 2-kbyte d-cache direct-mapped, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.2.1 tag ram set 0 do[22:0] ? {dc0tagdop[23:3],dc0tagdop[1:0]} 1b0 ? dc0tagdop[2] di[22:0] ? {dc0tagdip[23:3],dc0tagdip[1:0]} (open) ? dc0tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[22:0] ? high we[22:0] ? {22{dtweap[1]},dtweap[0]} enable ? high b.10.2.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[7:0] ? dcdataaddrp[10:3] (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.2.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[7:0] ? dcdataaddrp[10:3] (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
b-38 cache sizing and design concerns b.10.3 4-kbyte d-cache direct-mapped, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.3.1 tag ram set 0 do[21:0] ? {dc0tagdop[23:4],dc0tagdop[1:0]} 2b0 ? dc0tagdop[3:2] di[21:0] ? {dc0tagdip[23:4],dc0tagdip[1:0]} (open) ? dc0tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[21:0] ? high we[21:0] ? {21{dtweap[1]},dtweap[0]} enable ? high b.10.3.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[8:0] ? dcdataaddrp[11:3] (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.3.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[8:0] ? {dcdataaddrp[11:3] (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
d-cache direct-mapped ram hookup b-39 b.10.4 8-kbyte d-cache direct-mapped, writeback connections the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.4.1 tag ram set 0 do[20:0] ? {dc0tagdop[23:5],dc0tagdop[1:0]} 3b0 ? dc0tagdop[4:2] di[20:0] ? {dc0tagdip[23:5],dc0tagdip[1:0]} (open) ? dc0tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[20:0] ? high we[20:0] ? {20{dtweap[1]},dtweap[0]} enable ? high b.10.4.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.4.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
b-40 cache sizing and design concerns b.10.5 1-kbyte d-cache direct-mapped, writethrough connections for write through cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.5.1 tag ram set 0 do[22:0] ? {dc0tagdop[23:1]} 1b0 ? dc0tagdop[0] di[22:0] ? dc0tagdip[23:1] (open) ? dc0tagdip[0] a[4:0] ? dctagaddrp[9:5] (open) ? dctagaddrp[12:10] clk ? shell clock oe[22:0] ? high we[22:0] ? 23{dtweap[1]} enable ? high b.10.5.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[6:0] ? dcdataaddrp[9:3] (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high
d-cache direct-mapped ram hookup b-41 b.10.5.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[6:0] ? dcdataaddrp[9:3] (open) ? dcdataaddrp[12:10] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.10.6 2-kbyte d-cache direct-mapped, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.6.1 tag ram set 0 do[21:0] ? {dc0tagdop[23:3],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 1b0 ? dc0tagdop[2] di[21:0] ? {dc0tagdip[23:3],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[2] a[5:0] ? dctagaddrp[10:5] (open) ? dctagaddrp[12:11] clk ? shell clock oe[21:0] ? high we[21:0] ? 22{dtweap[1]} enable ? high
b-42 cache sizing and design concerns b.10.6.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[7:0] ? dcdataaddrp[10:3] (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.6.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[7:0] ? dcdataaddrp[10:3] (open) ? dcdataaddrp[12:11] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.10.7 4-kbyte d-cache direct-mapped, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made.
d-cache direct-mapped ram hookup b-43 b.10.7.1 tag ram set 0 do[20:0] ? {dc0tagdop[23:4],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 2b0 ? dc0tagdop[3:2] di[20:0] ? {dc0tagdip[23:4],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[3:2] a[6:0] ? dctagaddrp[11:5] (open) ? dctagaddrp[12] clk ? shell clock oe[20:0] ? high we[20:0] ? 21{dtweap[1]} enable ? high b.10.7.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[8:0] ? dcdataaddrp[11:3] (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.7.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[8:0] ? dcdataaddrp[11:3] (open) ? dcdataaddrp[12] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
b-44 cache sizing and design concerns b.10.8 8-kbyte d-cache direct-mapped, writethrough connections for writethrough cache, the wb bit in the tag is unused. the dc0tagdop[0] and dc1tagdop[0] inputs of the core should be tied low. the following connections between the CW4011 core and the synchronous ram modules need to be made. b.10.8.1 tag ram set 0 do[19:0] ? {dc0tagdop[23:5],dc0tagdop[1]} 1b0 ? dc0tagdop[0] 3b0 ? dc0tagdop[4:2] di[19:0] ? {dc0tagdip[23:5],dc0tagdip[1]} (open) ? dc0tagdip[0] (open) ? dc0tagdip[4:2] a[7:0] ? dctagaddrp[12:5] clk ? shell clock oe[19:0] ? high we[19:0] ? 20{dtweap[1]} enable ? high b.10.8.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high b.10.8.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ?
d-cache direct-mapped ram hookup b-45 {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high b.10.9 data scratchpad ram both or either d-cache set can be used as data scratchpad ram, where the address is ?xed in the memory address space. it is enabled by setting either the sr0 bit to one for set 0, or the sr1 bit to one for set 1, both the sr0 and sr1 bits are in the ccc register. if the address space is ?xed permanently, the tag memory is not necessary. the tag inputs must be tied either low or high according to the address mapping. if the address space should be programmable, the tag memory must be initialized to valid with appropriate address before used as a scratchpad ram by a cache maintenance function, which is enabled by isolate cache (isc) and tag bits of the ccc register. for more information, see section 4.3.10, con?guration and cache control (ccc) register. the following shows an example which has 8 kbytes of scratchpad ram only. b.10.9.1 tag ram set 0 do[31:13] ? {dc0tagdop[23:5],dc0tagdop[1]} 5b00010 ? dc0tagdop[4:0] (open) ? dc0tagdip[23:0] (open) ? dtweap[1] enable ? high b.10.9.2 data ram bank a do[31:0] ? dcadatadop[31:0] di[31:0] ? dcadatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcweap[3]},8{dcweap[2]},8{dcweap[1]},8{dcweap[0]}} enable ? high
b-46 cache sizing and design concerns b.10.9.3 data ram bank b do[31:0] ? dcbdatadop[31:0] di[31:0] ? dcbdatadip[31:0] a[9:0] ? dcdataaddrp[12:3] clk ? shell clock oe[31:0] ? high we[31:0] ? {8{dcwebp[3]},8{dcwebp[2]},8{dcwebp[1]},8{dcwebp[0]}} enable ? high
c-1 appendix c programmers notes this appendix contains information that will be useful if you are writing software for the CW4011 core. the information is arranged in functional groups: instruction related, cp0 or tlb related, and cache related. c.1 instruction-related notes the instruction prior to an eret must not generate an exception. you can use a nop (no operation) to make sure that this restriction is met. the waiti instruction must be followed by at least one nop. trap instructions must not be placed in branch delay slots. c.2 cp0 or tlbCrelated notes when the CW4011 is operating in r3000 exception compatibility mode, the rfe (restore from exception) instruction clears the ll (load linked) bit. this is consistent with r4000 mode eret operation. if a tlb is not present or enabled in the system, cp0 will re?ect a coprocessor unusable exception if an attempt is made to execute any of the tlb maintenance instructions: tlbp, tlbr, tlbwi, tlbwr. tlb instructions (tlbp, tlbr, tlbwi, tlbwr) cannot be preceded or followed by a data access instruction (load or store) that requires target address translation, that is, kseg , kseg2 . the instruction prior to a tlbwi or tlbwr instruction must not generate an exception. you can use a nop to make sure that this restriction is met.
c-2 programmers notes three instructions are required between a mtc0 instruction that targets any of the tlb support registers (that is, entryhi, entrylo, pagemask, and index) and a subsequent tlbwi or tlbwr instruc- tion. this ensures that the results of the prior mtc0 instruction will be seen by the tlb write operation five instructions are required between a mtc0 status register operation that updates the coprocessor usability ?eld (status[31:28]) and a subsequent coprocessor instruction that expects to see the updated value. seven instructions are required between a mtc0 epc register operation and a subsequent eret instruction that expects to see the updated value. c.3 cache-related notes when the CW4011 is operating in isolate cache mode, load and store operations to the cache are not allowed in the delay slot of branch likely instructions. c.4 cw33300 compatible debug extension notes the existing cw33300 has some extensions to the cp0 that provide enhanced debugging and exception handler support. the CW4011 core remains compatible with these enhancements. refer to the cw33300 enhanced self-embedding processor core users manual for further information.
glossary-1 glossary big-endian this is a method of data formatting in which each ?eld is addressed by referring to its most-signi?cant byte. this means that if you are accessing a four-byte, singleword, the most-signi?cant byte is byte 03, and the most-signi?cant bit is bit 31. see also little-endian . bus sizing refers to the ability of the processor to support and interface with data buses of different sizes. bus snooping this is the method used by the cache controller to monitor memory accesses performed by other bus masters. direct-map caching in a direct-mapped cache, each memory location is mapped to one position in the cache. direct mapping is useful if you are storing small loops and sequential operations. you can use this type of caching for both the d-cache and the i-cache. encrypted encrypted ?les are source code ?les that have been processed in a language such as hdl or verilog so that they are only machine readable. this process enables you to have access to the behavior of the ?les but not to the intellectual property associated with them. fixup cycle this is a clock cycle during which the load miss data is funneled back to the instruction that requires it. little-endian this is a method of data formatting in which each ?eld is addressed by referring to its least-signi?cant byte. this means that if you are accessing a four-byte, singleword, the most-signi?cant byte is byte 00, and the most-signi?cant bit is bit 00. the CW4011 supports both little-endian and big-endian formats. see also big-endian . placement algorithms information is placed in a cache using placement algorithms. these algorithms de?ne the positions in the cache where the information from a particular memory location may be stored. the CW4011 uses two types of algorithm, direct mapping and two-way set associative mapping.
glossary-2 slip condition a slip condition occurs when the pipeline stalls after the ex stage. in this situation, the previous instruction is executed and clears the pipeline. however, the earlier stages of the pipeline are stalled. two-way set associative caching in a two-way set associative cache, each memory location is stored in one of two possible positions. two-way set associative mapping is well-suited for data references, which tend to be more scattered than instructions. you can use this type of caching for both the d-cache and the i-cache. unencrypted files that are unencrypted have not been subjected to the processing described in the entry encrypted . these ?les are human readable and can be written using a text editor. verilog model verilog is an open standard language. a verilog model represents a design in the language. it provides no indication of the level of abstraction.
customer feedback we would appreciate your feedback on this document. please copy the following page, add your comments, and fax it to us at the address on the following page. if appropriate, please also fax copies of any marked-up pages from this document. impor tant: please include your name, phone number, fax number, and company address so that we may contact you directly for clari?cation or additional information. thank you for your help in improving the quality of our documents.
readers comments fax your comments to: lsi logic corporation technical publications m/s e-198 fax: 408.433.4333 please tell us how you rate this document: minirisc CW4011 supersca- lar microprocessor core technical manual. place a check mark in the appropriate blank for each category. what could we do to improve this document? if you found errors in this document, please specify the error and page number. if appropriate, please fax a marked-up copy of the page(s). please complete the information below so that we may contact you directly for clari?cation or additional information. excellent good average fair poor completeness of information ____ ____ ____ ____ ____ clarity of information ____ ____ ____ ____ ____ ease of ?nding information ____ ____ ____ ____ ____ technical content ____ ____ ____ ____ ____ usefulness of examples and illustrations ____ ____ ____ ____ ____ overall manual ____ ____ ____ ____ ____ name date telephone title company name street city, state, zip department mail stop fax
u.s. distributors by state h. h. hamilton hallmark w. e. wyle electronics alabama huntsville h. h. tel: 205.837.8700 w. e. tel: 800.964.9953 alaska h. h. tel: 800.332.8638 arizona phoenix h. h. tel: 602.736.7000 w. e. tel: 800.528.4040 tucson h. h. tel: 520.742.0515 arkansas h. h. tel: 800.327.9989 california irvine h. h. tel: 714.789.4100 w. e. tel: 800.626.9953 los angeles h. h. tel: 818.594.0404 w. e. tel: 800.288.9953 sacramento h. h. tel: 916.632.4500 w. e. tel: 800.627.9953 san diego h. h. tel: 619.571.7540 w. e. tel: 800.829.9953 san jose h. h. tel: 408.435.3500 santa clara w. e. tel: 800.866.9953 woodland hills h. h. tel: 818.594.0404 colorado denver h. h. tel: 303.790.1662 w. e. tel: 808.933.9953 connecticut cheshire h. h. tel: 203.271.5700 wallingford w. e. tel: 800.605.9953 delaware north/south h. h. tel: 800.526.4812 tel: 800.638.5988 florida fort lauderdale h. h. tel: 305.484.5482 w. e. tel: 800.568.9953 orlando h. h. tel: 407.657.3300 w. e. tel: 407.740.7450 n. florida w. e. tel: 800.395.9953 st. petersburg h. h. tel: 813.507.5000 georgia atlanta h. h. tel: 770.623.4400 w. e. tel: 800.876.9953 hawaii h. h. tel: 800.851.2282 idaho h. h. tel: 801.266.2022 illinois north/south h. h. tel: 847.797.7300 tel: 314.291.5350 chicago w. e. tel: 800.853.9953 indiana indianapolis h. h. tel: 317.575.3500 w. e. tel: 317.581.6152 iowa cedar rapids h. h. tel: 319.393.0033 kansas kansas city h. h. tel: 913.663.7900 kentucky central/northern/ western h. h. tel: 800.984.9503 tel: 800.767.0329 tel: 800.829.0146 louisiana north/south h. h. tel: 800.231.0253 tel: 800.231.5575 maine h. h. tel: 800.272.9255 maryland baltimore h. h. tel: 410.720.3400 w. e. tel: 800.863.9953 massachusetts boston h. h. tel: 508.532.9808 w. e. tel: 800.444.9953 marlborough w. e. tel: 508.480.9900 michigan detroit h. h. tel: 313.416.5800 grandville h. h. tel: 616.531.0345 minnesota minneapolis h. h. tel: 612.881.2600 w. e. tel: 800.860.9953 mississippi h. h. tel: 800.633.2918 missouri st. louis h. h. tel: 314.291.5350 montana bozeman h. h. tel: 800.526.1741 nebraska h. h. tel: 800.332.4375 nevada las vegas h. h. tel: 800.528.8471 new hampshire h. h. tel: 800.272.9255 new jersey north/south h. h. tel: 201.515.1641 tel: 609.222.6400 pine brook w. e. tel: 800.862.9953 new mexico albuquerque h. h. tel: 505.293.5119 new york long island h. h. tel: 516.434.7400 w. e. tel: 800.861.9953 rochester h. h. tel: 716.475.9130 w. e. tel: 800.319.9953 syracuse h. h. tel: 315.453.4000 north carolina raleigh h. h. tel: 919.872.0712 w. e. tel: 800.560.9953 north dakota h. h. tel: 800.829.0116 ohio cleveland h. h. tel: 216.498.1100 w. e. tel: 800.763.9953 dayton h. h. tel: 614.888.3313 w. e. tel: 800.763.9953 oklahoma tulsa h. h. tel: 918.459.6000 oregon portland h. h. tel: 503.526.6200 w. e. tel: 800.879.9953 pennsylvania pittsburgh h. h. tel: 412.281.4150 philadelphia h. h. tel: 800.526.4812 w. e. tel: 800.871.9953 rhode island h. h. 800.272.9255 south carolina h. h. tel: 919.872.0712 south dakota h. h. tel: 800.829.0116 tennessee east/west h. h. tel: 800.241.8182 tel: 800.633.2918 texas austin h. h. tel: 512.219.3700 w. e. tel: 800.365.9953 dallas h. h. tel: 214.553.4300 w. e. tel: 800.955.9953 el paso h. h. tel: 800.526.9238 houston h. h. tel: 713.781.6100 w. e. tel: 800.888.9953 rio grande valley h. h. tel: 210.412.2047 utah draper w. e. tel: 800.414.4144 salt lake city h. h. tel: 801.266.2022 w. e. tel: 800.477.9953 vermont h. h. tel: 800.272.9255 virginia h. h. tel: 800.638.5988 washington seattle h. h. tel: 206.882.7000 w. e. tel: 800.248.9953 wisconsin milwaukee h. h. tel: 414.780.7200 w. e. tel: 800.867.9953 wyoming h. h. tel: 800.332.9326
sales of?ces and design resource centers lsi logic corporation corporate headquarters tel: 408.433.8000 fax: 408.433.8989 north america california irvine tel: 714.553.5600 fax: 714.474.8101 san diego tel: 619.635.1300 fax: 619.635.1350 silicon valley sales of?ce tel: 408.433.8000 fax: 408.954.3353 design center tel: 408.433.8000 fax: 408.433.7695 colorado boulder tel: 303.447.3800 fax: 303.541.0641 florida boca raton tel: 561.989.3236 fax: 561.989.3237 illinois schaumburg tel: 847.995.1600 fax: 847.995.1622 kentucky bowling green tel: 502.793.0010 fax: 502.793.0040 maryland bethesda tel: 301.897.5800 fax: 301.897.8389 massachusetts waltham tel: 617.890.0180 fax: 617.890.6158 minnesota minneapolis tel: 612.921.8300 fax: 612.921.8399 new jersey edison tel: 908.549.4500 fax: 908.549.4802 new york new york tel: 716.223.8820 fax: 716.223.8822 north carolina raleigh tel: 919.783.8833 fax: 919.783.8909 oregon beaverton tel: 503.645.0589 fax: 503.645.6612 texas austin tel: 512.388.7294 fax: 512.388.4171 dallas tel: 972.788.2966 fax: 972.233.9234 houston tel: 281.379.7800 fax: 281.379.7818 washington issaquah tel: 425.837.1733 fax: 425.837.1734 canada ontario ottawa tel: 613.592.1263 fax: 613.592.3253 toronto tel: 416.620.7400 fax: 416.620.5005 quebec montreal tel: 514.694.2417 fax: 514.694.2699 international australia reptechnic pty ltd new south wales tel: 612.9953.9844 fax: 612.9953.9683 denmark lsi logic development centre ballerup tel: 45.44.86.55.55 fax: 45.44.86.55.56 france lsi logic s.a. paris tel: 33.1.34.63.13.13 fax: 33.1.34.63.13.19 germany lsi logic gmbh munich tel: 49.89.4.58.33.0 fax: 49.89.4.58.33.108 stuttgart tel: 49.711.13.96.90 fax: 49.711.86.61.428 hong kong avt industrial ltd hong kong tel: 852.2428.0008 fax: 852.2401.2105 india logicad india private ltd bangalore tel: 91.80.526.2500 fax: 91.80.338.6591 israel lsi logic ramat hasharon tel: 972.3.5.403741 fax: 972.3.5.403747 netanya tel: 972.9.657190 fax: 972.9.657194 italy lsi logic s.p.a. milano tel: 39.39.687371 fax: 39.39.6057867 japan lsi logic k.k. tokyo tel: 81.3.5463.7821 fax: 81.3.5463.7820 osaka tel: 81.6.947.5281 fax: 81.6.947.5287 korea lsi logic korea inc. seoul tel: 82.2.528.3400 fax: 82.2.528.2250 singapore lsi logic pte ltd singapore tel: 65.334.9061 fax: 65.334.4749 spain lsi logic s.a. madrid tel: 34.1.556.07.09 fax: 34.1.556.75.65 sweden lsi logic ab stockholm tel: 46.8.444.15.00 fax: 46.8.750.66.47 switzerland lsi logic sulzer ag brugg/biel tel: 41.32.536363 fax: 41.32.536367 taiwan lsi logic asia-paci?c taipei tel: 886.2.718.7828 fax: 886.2.718.8869 cheng fong technology corporation tel: 886.2.910.1180 fax: 886.2.910.1175 lumax international corporation, ltd tel: 886.2.788.3656 fax: 886.2.788.3568 macro-vision technology inc. tel: 886.2.698.3350 fax: 886.2.698.3348 united kingdom lsi logic europe ltd bracknell tel: 44.1344.426544 fax: 44.1344.481039 sales of?ces with design resource centers

▲Up To Search▲

Price & Availability of CW4011

	To Download CW4011 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .